estebanpw / chromeister

A dotplot generator for large chromosomes
GNU General Public License v3.0
39 stars 4 forks source link

bring chromeister to bioconda and update galaxy tool #6

Closed bernt-matthias closed 4 years ago

bernt-matthias commented 4 years ago

Just found the chromeister galaxy tool https://toolshed.g2.bx.psu.edu/repository?repository_id=908b5504c1930f12 and noticed a few possibilities to improve it. I would suggest to

If you are interested I'm happy to help.

estebanpw commented 4 years ago

Hello Bernt,

Thank you very much for your generosity! I am personally interested in this, but I have to run it through my advisor first. Question: Are there any copyrighting issues if moving it to the IUC tool collection?

I think I can get back to you next week --- best regards!

bernt-matthias commented 4 years ago

For bioconda (or conda-forge) some kind of open source license (GPL, Apache, MIT) would be needed for your project.

The bioconda recipe and a galaxy tool at IUC would be under some open source license (MIT for IUC .. not sure about bioconda). But this would be completely independent of the license of your project.

I guess it would be a good idea to add a license to this project anyway, since its (to my understanding) unprotected otherwise.

The best arguments pro conda would be that the tool would be easily installable and you get docker and singularity containers for free. A IUC galaxy tool could be deployed installed on usegalaxy.eu / usegalaxy.org which would render it easily usable to everyone (using quite large compute resources).

estebanpw commented 4 years ago

Hey there again

I have discussed this with my advisor and its all good. In fact I have added a GPL license to the original repository (this one) and also updated the one in https://github.com/Bitlab-UMA/chromeister which I believe its the one the galaxy toolshed was pointing to.

Let me know how we can proceed -- if you want we can discuss this through skype, slack, etc. My email is estebanpw{at}uma.es

Best regards

bernt-matthias commented 4 years ago

That's good news. I guess for bioconda we only need a new release that includes the license file. Initially I thought that there will be a few changes to the shell scripts necessary (in conda all shell scripts and binaries can be expected to be in $PATH .. so the $BINDIR is not necessary) but I think this should just work.

The next step would be to submit a conda recipe to bioconda. If you like I could do this. I think I would only need to know a test call (usually conda tests just call the tool with the --help or --version flag to check if its running).

Then I would suggest that you submit the tool to IUC. I could give you some feedback ahead of the submission. Otherwise we can do this during the IUC review.

estebanpw commented 4 years ago

Just made a new release including the license. You can find it here

Regarding the $PATH and $BINDIR: I think the easiest way is to use the script run_and_plot_chromeister.sh (this is also the one you were referring to, correct?) to run it. Initially I did not add an installer to the $PATH and to /usr/bin so that no root privileges were required. However I can also add an .sh installer that specifies that sudo is required. Let me know about this.

Regarding the conda recipe, I have never done this but, if you feel like doing it, thats great :-) otherwise I will look into it.

Regarding the test call, the binary CHROMEISTER can be called in fact with --help and it will print some info on the stdout and then exit successfully. Is this what you need? or do you need a complete test with input data and output validation?

bernt-matthias commented 4 years ago

Opened a PR on bioconda: https://github.com/bioconda/bioconda-recipes/pull/24525 .. please comment if you want any changes.

Lets see if CI runs positively.

bernt-matthias commented 4 years ago

Is dplyr really needed? Could only find the usage of ape in your R scripts.

estebanpw commented 4 years ago

I just checked and it was only being used for a select function which I have just replaced with other base functions. So I just removed package dplyr, updated the repository and the release. Regarding the ape package, this is used but not in the main pipeline nor will it be used from using the tool directly (thats why it was not described as necessary in the readme). You could skip this package if you wanted to make it easier, as it is only intended for manual and non automatic use.

Btw I saw that the pull request failed, anything I can do?

bernt-matthias commented 4 years ago

I will leave ape in the requirements. Its better if all scripts just run (even supplementary scripts).

The failure was due to the line: CC=gcc in the Makefile. I quick fixed this by removing it via sed -i -e 's/^CC=.*//' src/Makefile. According to the bioconda community (https://gitter.im/bioconda/Lobby) the line should be CC="${CC}". You may change this for a further release. But its fine for now.

Test is still failing because CHROMEISTER --help is returning exit code 1. I will implement a workaround. But this might be changed here as well if you like.

I could start to review the Galaxy tool and send you my comments. How do you like them? Just here or mail or something else?

estebanpw commented 4 years ago

The --help thing came as a surprise to me, as I changed this in one of the previous comments when we started (line 616 at the main CHROMEISTER.c file). Turns out, I had not pushed the new binary in the commit, so unless a make all is performed it will fail with the existing binary : /

I just updated the repository, so it should no longer return exit code 1 even without a make.

Regarding the review, I would rather have it on mail (estebanpw{ at } uma.es

bernt-matthias commented 4 years ago

exit(1) is still in the 1.0 release. But its fine for now and can be changed in the next commit. And yes, bioconda runs make all.

I send you an email with some suggestions for the Galaxy tool. Don't hesitate to ask questions if needed.

bernt-matthias commented 4 years ago

Hi @estebanpw the conda package finally builds successfully for linux and osx. There were a few changes necessary to make it run:

In the makefile:

In the sources:

If you like you can change it here and make a new release and we just bring the new version to conda -- but you can also do this later.

If you like you can test the package as follows:

conda install -c https://124828-42372094-gh.circle-artifacts.com/0/tmp/artifacts/packages -c bioconda -c conda-forge chromeister

or if you prefer to create a separate environment

conda create -y --quiet --override-channels -c https://124828-42372094-gh.circle-artifacts.com/0/tmp/artifacts/packages -c conda-forge -c bioconda -c defaults -n chromeister chromeister

Would be great if we could test the binary, one of the R scripts and one shell script...

I could then request a review and merge.

estebanpw commented 4 years ago

Hey there @bernt-matthias I have changed the Makefile as you suggested, removed the CC set (and replaced the variable with "gcc" in the compile line), included the missing pthread library and finally changed the += in the CFLAGS. Btw, isnt the line that goes:

BIN=../bin

going to interfere with the path in conda?

I am also including the test folder in this release.

Last, I have checked the exit command:

estebanpw@...:~/temp/chromeister/bin$ ./CHROMEISTER --help
USAGE:
       CHROMEISTER -query [query] -db [database] -out [outfile]
OPTIONAL:
       -kmer       [Integer:   k>1 (default 32)]
       -diffuse    [Integer:   z>0 (default 4)]
       -dimension  Size of the output [Integer:   d>0 (default 1000)]
       -out        [File path]
       --help      Shows help for program usage

PLEASE NOTICE: The reverse complementary is calculated for the QUERY.
estebanpw@...:~/temp/chromeister/bin$ echo $?
0
estebanpw@...:~/temp/chromeister/bin$ ./CHROMEISTER
ERR**** A query, database and output file is required ****
estebanpw@...:~/temp/chromeister/bin$ echo $?
255

Is it still failing for your? Please check, I made sure this time to update the release

bernt-matthias commented 4 years ago

Btw, isnt the line that goes: BIN=../bin going to interfere with the path in conda?

No. In bioconda we just compile it and copy the binary and scripts to the directory that is in PATH. See here

https://github.com/bioconda/bioconda-recipes/blob/c2cd40a55ba776a9a32f43a0511230b5cb7eea8b/recipes/chromeister/build.sh#L13

I can test the new release, but I do not see a new release https://github.com/estebanpw/chromeister/releases

estebanpw commented 4 years ago

No. In bioconda we just compile it and copy the binary and scripts to the directory that is in PATH. See here

OK

I can test the new release, but I do not see a new release https://github.com/estebanpw/chromeister/releases

This is because I updated the release instead of releasing a new one (Im pretty sure thats a bad practice) --- do you need a new release or can you use the existing one? I updated the binaries as well

bernt-matthias commented 4 years ago

I guess you only updated the binary, but you also should update the zip and tar.gz files. As far as I see those still contain exit(1)

estebanpw commented 4 years ago

This is very silly but, I expected the release to automatically update the .zip and tar.gz files when I edited them and made some commits (now its obvious to me that it doesnt, lol) . I have drafted a new one with the new binary and made sure that the zip and tar.gz files got updated. Sorry for the mess!

bernt-matthias commented 4 years ago

You should have still used $(CC) in the compilation ..

Maybe set it with:

CC ?= gcc

Then the Makefile overwrites CC if it's not already set. The problem is that the compiler binary binary may have different names on different systems (e.g. OSX and also conda).

Now I'm sorry :) But we make good progress.

estebanpw commented 4 years ago

My bad! Happy to learn about all this stuff thou :-)

I changed it to CC ?=gcc and updated the readme to indicate that the user should make sure it resolves to gcc! You can find the new release here https://github.com/estebanpw/chromeister/releases/tag/1.1.a

bernt-matthias commented 4 years ago

Wonderful. This worked. I also have checked the conda package (the binary and the R script that are used in the Galaxy tool). I will trigger a request for a merge now.