galaxyproteomics / tools-galaxyp

Galaxy Tool Shed repositories maintained and developed by the GalaxyP community
MIT License
34 stars 57 forks source link

metanovo: do not let user decide on cores #660

Closed bernt-matthias closed 2 years ago

bernt-matthias commented 2 years ago

Just wanted to try this tool and have observed some things that should be changed:

  1. Users should/must not be allowed to change the number of cores and the memory. That is the job of Galaxy and the Galaxy admin.

I disabled the possibility to set the core and set the Xmx and Xms to an empty strings. The number of available cores is available via the $GALAXY_SLOTS environment variable (I subtract 1 as mentioned in the docs).

The memory options need to be set by Galaxy / Galaxy admin via the _JAVA_OPTIONS environment variable (https://stackoverflow.com/questions/28327620/difference-between-java-options-java-tool-options-and-java-opts) which will overwrite these variables.

  1. Tests are needed @reid-wagner .. can you help here?

Finally I think that the bioconda recipe is should be improved. Instead of downloading (precompiled?) versions of SearchGUI and other software this should be specified as requirements in the recipe. I think most/all of the dependencies are available in bioconda.

reid-wagner commented 2 years ago

Hi @bernt-matthias,

Thank you for the feedback.

  1. That makes sense - I didn't realize those shouldn't be exposed to the user. Thank you for fixing that.
  2. I should have finished tests soon.
  3. Yes, when writing the bioconda recipe I somehow overlooked including SearchGUI and PeptideShaker from the dependency URLs given by the project as conda dependencies. I wasn't able to find DeNovoGUI or compomics-utilities.
bernt-matthias commented 2 years ago

I should have finished tests soon.

Wonderful. You should be able to add them to this branch if you like.

reid-wagner commented 2 years ago

It looks like openjdk is needed as a dependency for the bioconda recipe to have Galaxy tests pass, so I will working on getting that into the recipe. I can make the improvements already mentioned as well.

bernt-matthias commented 2 years ago

parallel has some odd output:

Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  Tange, O. (2022, February 22). GNU Parallel 20220222 ('Donetsk Luhansk').
  Zenodo. https://doi.org/10.5281/zenodo.6213471

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice

Maybe we can add a single call to parallel --citation or use an alternative as mentioned here https://www.gnu.org/software/parallel/parallel_design.html#citation-notice

bernt-matthias commented 2 years ago

Another warning in the output:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.gson.internal.bind.ReflectiveTypeAdapterFactory (file:/gpfs1/work/songalax/galaxy/database/jobs_directory/000/151/151979/working/metanovo/sg/lib/gson-2.4.jar) to field java.awt.Color.value
WARNING: Please consider reporting this to the maintainers of com.google.gson.internal.bind.ReflectiveTypeAdapterFactory
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
bernt-matthias commented 2 years ago

also the csv output looks odd:

,index,Accession,Record,ID,PeptideCount,Peptides,ScanCount,Scans,Organism,Length,File,Sample 2022-04-11_Hela (msms),SAF 2022-04-11_Hela,NSAF 2022-04-11_Hela,Summed_NSAF,Protein_Prob,Organism_Prob,MSMS_Percent,Combined_Prob
0,4023,C9JUM1,">tr|C9JUM1|C9JUM1_HUMAN Actin, cytoplasmic 1 (Fragment) OS=Homo sapiens OX=9606 GN=ACTB PE=1 SV=2
MDDDIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQS
KRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRV
",tr|C9JUM1|C9JUM1_HUMAN,6,"IWHHTFYNELR
IWHHTFYNEIR
AVFPSIVGR
DSYVGDEAQSK
AGFAGDDAPR
HQGVMVGMGQK",23,"2022-04-11_Hela.mgf.26367
2022-04-11_Hela.mgf.30207
2022-04-11_Hela.mgf.25087
2022-04-11_Hela.mgf.16187
2022-04-11_Hela.mgf.5179
2022-04-11_Hela.mgf.26687
2022-04-11_Hela.mgf.29247
2022-04-11_Hela.mgf.28287
2022-04-11_Hela.mgf.27647
2022-04-11_Hela.mgf.26048
2022-04-11_Hela.mgf.25727
2022-04-11_Hela.mgf.24767
2022-04-11_Hela.mgf.28607
2022-04-11_Hela.mgf.6267
2022-04-11_Hela.mgf.29568
2022-04-11_Hela.mgf.25407
2022-04-11_Hela.mgf.32097
2022-04-11_Hela.mgf.27967
2022-04-11_Hela.mgf.6000
2022-04-11_Hela.mgf.27327
2022-04-11_Hela.mgf.15860
2022-04-11_Hela.mgf.28928
2022-04-11_Hela.mgf.27007",Homo sapiens,96,./metanovo/db/180928_UniProt_Proteome_Human_Bos_taurus.fasta.part.1.fasta.csv,23,0.23958333333333334,0.0016585601424273148,0.0016585601424273148,0.001658560142427317,0.8693079739622822,,0.0014417995571080853
...

is it expected that there are newlines in the lines?

reid-wagner commented 2 years ago

Hi @bernt-matthias,

Thank you for these findings - I will be looking at this first thing tomorrow.

reid-wagner commented 2 years ago

It looks like the newlines are the behavior of the upstream tool. I can look further into if this is intentional.

bernt-matthias commented 2 years ago

It looks like the newlines are the behavior of the upstream tool. I can look further into if this is intentional.

Maybe just open an issue upstream

reid-wagner commented 2 years ago

I added tests in #661.

reid-wagner commented 2 years ago

Another warning in the output:

WARNING: An illegal reflective access operation has occurred

The latest bioconda recipe specifically includes JDK 8, which will not produce this warning. I believe JDK 9 & later will handle this "illegal reflective access operation" okay, but JDK 8 is what is used by the upstream singularity image.

reid-wagner commented 2 years ago

parallel has some odd output:

I believe if parallel output is piped or redirected to a file, this is not shown. For example, when the MetaNovo Galaxy tool is run, this does not show up in stdout or stderr (unless you have a counter-example).

reid-wagner commented 2 years ago

@bernt-matthias - I am working on the bioconda side of things to have those specific dependencies come from their conda packages. For this PR, is there anything else that needs to be included for this to be merged, beside review?

reid-wagner commented 2 years ago

@bernt-matthias - I am working on the bioconda side of things to have those specific dependencies come from their conda packages. For this PR, is there anything else that needs to be included for this to be merged, beside review?

bernt-matthias commented 2 years ago

@reid-wagner I would then suggest to wait for the bioconda receipe before merging?

reid-wagner commented 2 years ago

Because it would be useful to have this update available, and the bioconda PR is independent of this, would you be okay with merging this now?

bernt-matthias commented 2 years ago

Thanks @reid-wagner

reid-wagner commented 2 years ago

Thank you!

reid-wagner commented 2 years ago

@bernt-matthias - I have a question about the bioconda updates. In what case would the newer build, but the same version, of a conda dependency for a tool be used? For example, in metanovo on usegalaxy.eu, the latest conda build is not used, and I'd like to figure out how to have it updated.

bernt-matthias commented 2 years ago

Hi @reid-wagner .. only the admins can do this. I guess a message over here https://gitter.im/usegalaxy-eu/Lobby may do the trick.

bgruening commented 2 years ago

I can remove the conda env and reinstall it ... would this solve your problem?

bgruening commented 2 years ago

Done!

reid-wagner commented 2 years ago

Thank you so much @bgruening, that did the trick!