StaPH-B / docker-builds

:package: :whale: Dockerfiles and documentation on tools for public health bioinformatics
GNU General Public License v3.0
187 stars 119 forks source link

RDP classifier #1040

Closed taylorpaisie closed 1 month ago

taylorpaisie commented 1 month ago

Pull Request (PR) checklist:

chrisgulvik commented 1 month ago

FYI this is a super fast and useful Genus-level 16S rRNA gene classifier tool. Part of the reason the container is bigger than most is it includes the database file within the package. There's no documentation to change/create the database either. The developer left MSU (RDP Project) years ago for industry, and just made a code update (with new builtin DB) last year on her own time.

erinyoung commented 1 month ago

This image looks great! Thank you for putting it together.

The GA failed because your labels are in the wrong stage. Could you move them to 'app'?

Also, is there are reason why you are using so many ENV variables? ENV persist after build time, so we generally recommend to use ARG instead so that the environment doesn't end up with unused environmental variables.

erinyoung commented 1 month ago

Your tests all passed:

#17 [test 4/4] RUN python3 -m unittest discover -v -s ../tests
#17 0.390 2024-08-28 20:06:06 URL:https://raw.githubusercontent.com/taylorpaisie/docker_containers/main/rdp/2.14/16S_rRNA_gene.Burkholderia_pseudomallei.2002721184.AY305776.1.fasta [1629/1629] -> "16S_test.fa" [1]
#17 0.613 2024-08-28 20:06:06 URL:https://raw.githubusercontent.com/taylorpaisie/docker_containers/main/rdp/2.14/18S_rRNA_gene.Homo_sapiens.T2T-CHM13v2.0.Chromosome13.fasta [2026/2026] -> "18S_test.fa" [1]
#17 0.687 Exception in thread "main" java.lang.InternalError: Error loading java.security file
#17 0.687   at java.base/java.security.Security.initialize(Security.java:94)
#17 0.688   at java.base/java.security.Security$1.run(Security.java:79)
#17 0.688   at java.base/java.security.Security$1.run(Security.java:77)
#17 0.688   at java.base/java.security.AccessController.doPrivileged(Native Method)
#17 0.688   at java.base/java.security.Security.<clinit>(Security.java:77)
#17 0.688   at java.base/sun.security.util.SecurityProperties.getOverridableProperty(SecurityProperties.java:53)
#17 0.688   at java.base/sun.security.util.SecurityProperties.privilegedGetOverridable(SecurityProperties.java:44)
#17 0.688   at java.base/sun.security.util.FilePermCompat.<clinit>(FilePermCompat.java:45)
#17 0.688   at java.base/java.io.FilePermission.init(FilePermission.java:317)
#17 0.688   at java.base/java.io.FilePermission.<init>(FilePermission.java:488)
#17 0.688   at java.base/sun.net.www.protocol.file.FileURLConnection.getPermission(FileURLConnection.java:221)
#17 0.688   at java.base/sun.net.www.protocol.jar.JarFileFactory.getPermission(JarFileFactory.java:162)
#17 0.688   at java.base/sun.net.www.protocol.jar.JarFileFactory.getCachedJarFile(JarFileFactory.java:126)
#17 0.689   at java.base/sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:81)
#17 0.689   at java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:125)
#17 0.689   at java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:155)
#17 0.689   at java.base/java.net.URL.openStream(URL.java:1165)
#17 0.689   at java.base/java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1744)
#17 0.689   at java.base/java.lang.Class.getResourceAsStream(Class.java:2650)
#17 0.689   at edu.msu.cme.rdp.classifier.utils.ClassifierFactory.getFactory(ClassifierFactory.java:110)
#17 0.689   at edu.msu.cme.rdp.multicompare.MultiClassifier.<init>(MultiClassifier.java:64)
#17 0.689   at edu.msu.cme.rdp.multicompare.MultiClassifier.<init>(MultiClassifier.java:75)
#17 0.689   at edu.msu.cme.rdp.multicompare.Main.main(Main.java:247)
#17 0.689   at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:67)
#17 0.781 Exception in thread "main" java.lang.InternalError: Error loading java.security file
#17 0.781   at java.base/java.security.Security.initialize(Security.java:94)
#17 0.781   at java.base/java.security.Security$1.run(Security.java:79)
#17 0.781   at java.base/java.security.Security$1.run(Security.java:77)
#17 0.782   at java.base/java.security.AccessController.doPrivileged(Native Method)
#17 0.782   at java.base/java.security.Security.<clinit>(Security.java:77)
#17 0.782   at java.base/sun.security.util.SecurityProperties.getOverridableProperty(SecurityProperties.java:53)
#17 0.782   at java.base/sun.security.util.SecurityProperties.privilegedGetOverridable(SecurityProperties.java:44)
#17 0.782   at java.base/sun.security.util.FilePermCompat.<clinit>(FilePermCompat.java:45)
#17 0.782   at java.base/java.io.FilePermission.init(FilePermission.java:317)
#17 0.782   at java.base/java.io.FilePermission.<init>(FilePermission.java:488)
#17 0.782   at java.base/sun.net.www.protocol.file.FileURLConnection.getPermission(FileURLConnection.java:221)
#17 0.782   at java.base/sun.net.www.protocol.jar.JarFileFactory.getPermission(JarFileFactory.java:162)
#17 0.782   at java.base/sun.net.www.protocol.jar.JarFileFactory.getCachedJarFile(JarFileFactory.java:126)
#17 0.782   at java.base/sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:81)
#17 0.782   at java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:125)
#17 0.782   at java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:155)
#17 0.782   at java.base/java.net.URL.openStream(URL.java:1165)
#17 0.783   at java.base/java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1744)
#17 0.783   at java.base/java.lang.Class.getResourceAsStream(Class.java:2650)
#17 0.783   at edu.msu.cme.rdp.classifier.utils.ClassifierFactory.getFactory(ClassifierFactory.java:110)
#17 0.783   at edu.msu.cme.rdp.multicompare.MultiClassifier.<init>(MultiClassifier.java:64)
#17 0.783   at edu.msu.cme.rdp.multicompare.MultiClassifier.<init>(MultiClassifier.java:75)
#17 0.783   at edu.msu.cme.rdp.multicompare.Main.main(Main.java:247)
#17 0.783   at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:67)
#17 0.791 test_rdp16S (test_controls.TestControls) ... ok
#17 0.791 test_rdp18S (test_controls.TestControls) ... ok
#17 0.791 test_python (test_versions.TestVersion) ... ok
#17 0.792 
#17 0.792 ----------------------------------------------------------------------
#17 0.792 Ran 3 tests in 0.680s
#17 0.792 
#17 0.792 OK
#17 DONE 0.8s
erinyoung commented 1 month ago

I'm trying to provide some guidance on a CMD layer. This would be the default command that runs when given no arguments (i.e. docker run staphb/rdp:latest). Generally we use <tool --help>, but that doesn't seem to work for rdp.

chrisgulvik commented 1 month ago

I'm trying to provide some guidance on a CMD layer. This would be the default command that runs when given no arguments (i.e. docker run staphb/rdp:latest). Generally we use <tool --help>, but that doesn't seem to work for rdp.

@erinyoung Only rdp_classifier (no args) brings up the help menu with the subcommand options (e.g., rdp_classifier classify). Both -h and --help still print the help menu including a 1-line warning that neither is a supported flag, although both return 0 exit status still.

erinyoung commented 1 month ago

Sweet! Thank you for your response! Can I push some edits your dockerfile? I'd like to move the labels, change some ENV to ARG, and add a CMD line.

taylorpaisie commented 1 month ago

Yeah go for it @erinyoung, sorry been meaning to make some corrections, just haven't had the time yet! But thank you so much for your help!

erinyoung commented 1 month ago

@taylorpaisie , as I was looking over your Dockerfile, it doesn't look like this tool requires compiling, so the 'builder' stage may not be worth the effort.

I've made some adjustments to your Dockerfile and README.md in this PR. Let me know if you think these will work.

For the Dockerfile: Removed most of the 'ENV' layers except for the PATH update Removed the builder stage and instead reduces the amount downloaded with apt-get Switched hard-coded filepath to one with an ARG (it's easier to update these in the future) Switched the base.image to jammy instead of focal (jammy is newer, so the image will have a longer lifespan) Adjusted the path of the repository Added CMD layer (this is the command the executes if no command is given to the image) Added a new workdir for the test stage Added classifier to test stage (to ensure it was in path) Updated python version test to 3.10

The only things left are to add rdp/classifier to the following two files:

I would also encourage you (and anyone else who assisted with this) to add your github handle to the bottom of the README.md at root. I love it when people get credit for contributing work.

erinyoung commented 1 month ago

Thank you for putting this together!

The GA that are using your Dockerfile to build an image and deploy it to Dockerhub and quay can be found here : https://github.com/StaPH-B/docker-builds/actions/runs/10801924836

Once this action is complete, the image should be available at https://hub.docker.com/r/staphb/rdp with the tags of 'latest' and '2.14'