Open amcaraballor opened 5 years ago
@alexeigurevich shoudl be able to answer that
Yes, it's my bad regarding the documentation, plan to create it soon. There is more or less complete documentation for the command-line version of the tool (here) but not for the GNPS workflow. Note that in the command-line version we renamed RiPPquest to MetaMiner and improved it in some aspects, the new publication will come out in Cell Systems soon, by that time I plan to update the GNPS workflow with the new functionality as well (and also rename it to MetaMiner).
You can find small sample data for RiPPquest/MetaMiner in our GitHub repo here. For convenience, I attached here an archive with all these files (spectra and sequence) and a correspondence file (not available in the repo): RiPPquest_test_data.zip. For this data, you can use "Running mode: high-high" (default is "high-low"). Sample job with this data is here.
You are right regarding all the file extensions:
Spectrum Files (Required): mzML, mzXML, mgf?
We natively support mzXML and MGF and automatically convert all other formats (e.g. mzML) to MGF using msconvert
third-party utility.
Sequence Files (Required): fasta?
Yes, we expect a fasta file with nucleotide sequence(s).
Spectra-Sequence Correspondence File (Optional): format? .csv, .tsv, .txt? template available?
The file should be tab-separated and has two columns listing basenames of spectra and sequence files. If not provided, the all-vs-all analysis will be performed. The extension doesn't matter here, for instance, could be any of .csv, .tsv, .txt
. We expect that the first column contains spectra info and the second one is about sequences. To change the order of columns, you can use an optional header line:
Sequence Spectra
(use tab in between, don't copy-paste a space character from here)
I will keep this issue open until I publish proper documentation for the RiPPquest GNPS workflow. Thanks.
Thanks a lot dear @alexeigurevich , this reply is super useful until the MetaMiner gets incorporated into the GNPS. I will keep you posted for any issues.
@alexeigurevich Thanks for the detailed response. Feel free to add a page to the https://github.com/CCMS-UCSD/GNPSDocumentation page and we can make it live once the tool goes live.
Is there any dataset that can be used for new users to learn and test the workflow? The following files will be necessary: Spectrum Files (Required): mzML, mzXML, mgf? Sequence Files (Required): fasta? Spectra-Sequence Correspondence File (Optional): format? .csv, .tsv, .txt? template available?