Shenhav-and-Korem-labs / SCRuB

Other
25 stars 3 forks source link

Conda installable and command line executable versions #3

Open jolespin opened 1 year ago

jolespin commented 1 year ago

I'd like to add this to my github.com/jolespin/veba software package under the mapping module. Currently, I can only add packages from conda because the environments are so complex and there are many programs with different dependencies.

Is it possible in the near future to add this to the bioconda channel? Also, to have the installation come with a command line executable that could be run outside of R in pipelines?

gaustin15 commented 1 year ago

Thanks for bringing up these points. We are currently working on adding SCRuB to bioconda, we will add any updates to this issue.

Regarding the command line interface — would our qiime cli command described here (https://shenhav-and-korem-labs.github.io/SCRuB/QIIME2-Tutorial.html) work within your pipeline? If not, we do also have other scripts embedded in the backend of the q2-SCRuB repo that include command line arguments, such as

run_SCRuB.R --samples_counts_path plasma_data.csv --sample_metadata_path plasma_metadata.csv --output_path scrub_out.csv

If it helps, we can add this run_SCRuB.R script to the package’s exports (along with additional command line options to accommodate other input formats).

Also, for another alternative potentially relevant to some bash scripts, we do currently support command line operations through SCRuB’s base R package (with an additional write statement), for example:

Rscript -e “write.csv( SCRuB::SCRuB('plasma_data.biom’, 'plasma_metadata.csv’ )$decontamined_samples, ‘scr_out.csv’)”

Please let me know if there are any requirements we can add in to our command line executables that are needed to fit within your pipeline.

jolespin commented 1 year ago

Thanks for getting back to me so quickly! Congrats on the publication by the way.

Regarding the command line interface — would our qiime cli command described here (https://shenhav-and-korem-labs.github.io/SCRuB/QIIME2-Tutorial.html) work within your pipeline? If not, we do also have other scripts embedded in the backend of the q2-SCRuB repo that include command line arguments, such as

Does this require QIIME installation? QIIME is a VERY heavy weight package with lots of dependencies so getting QIIME installed in an existing conda environment that is not an environment specific for QIIME is an extremely difficult task a lot of times. Especially if only one package is being used.

run_SCRuB.R --samples_counts_path plasma_data.csv --sample_metadata_path plasma_metadata.csv --output_path scrub_out.csv

If it helps, we can add this run_SCRuB.R script to the package’s exports (along with additional command line options to accommodate other input formats).

This usage looks great! If it could be available with the bioconda installation that would be awesome . Not sure if you're taking feature requests but if you could add a delimiter or separator argument that would make it much easier to integrate directly. Most software I've used takes tab delimited. It would be cool if you could do something like:

scrub --samples_counts_path plasma_data.tsv --sample_metadata_path plasma_metadata.tsv --output_path scrub_out.tsv --sep "\t"

What are your thoughts?

gaustin15 commented 1 year ago

Thanks for getting back to me so quickly! Congrats on the publication by the way.

Thank you!

Does this require QIIME installation? QIIME is a VERY heavy weight package with lots of dependencies so getting QIIME installed in an existing conda environment that is not an environment specific for QIIME is an extremely difficult task a lot of times. Especially if only one package is being used.

Yeah that would require QIIME installation; it's a good point that a command line option without that requirement would be helpful. We'll add this other approach in future versions and bioconda installations.

Not sure if you're taking feature requests but if you could add a delimiter or separator argument that would make it much easier to integrate directly. Most software I've used takes tab delimited. It would be cool if you could do something like:

scrub --samples_counts_path plasma_data.tsv --sample_metadata_path plasma_metadata.tsv --output_path scrub_out.tsv --sep "\t"

Yeah, adding this in would be no trouble at all, thanks for the suggestion!