Missing all_16S_seqs.udb

ramay commented 2 years ago

Hi,

I have installed mimosa2 package and vsearch. The test_m2_analysis(test_vsearch = T) function runs successfully. Although when I try to run it on my samples. I get the following error. I think I am missing the all_16S_seqs.udb but I am not sure if this needs to be downloaded or is it generated by mimosa?

The configuration file I am using is a tab delimited file

file1   ~/Downloads/Microbiome_ASV_abundance.txt
file2   ~/Downloads/metabolomics_KEGG_abundancecsv.csv
file1_type      Sequence variants (ASVs)
ref_choices     RefSeq/EMBL_GEMs genomes and models
data_prefix     ~/projects/Stephens/OTU_EMBL/data/
logTranform     T

Thanks! Hena

Building community metabolic network model
[1] "vsearch --usearch_global ~/projects/Stephens/OTU_EMBL/data//embl_gems/seqtempAKTIX09772.fasta --db ~/projects/Stephens/OTU_EMBL/data//embl_gems/all_16S_seqs.udb --id 0.99 --strand both --blast6out ~/projects/Stephens/OTU_EMBL/data//embl_gems/seqtempAKTIX09772vsearch_results.txt"
vsearch v2.21.1_macos_x86_64, 16.0GB RAM, 8 cores
https://github.com/torognes/vsearch

Fatal error: Unable to get status for input file (/Users/hena/projects/Stephens/OTU_EMBL/data//embl_gems/all_16S_seqs.udb)
Error in setnames(results, paste0("V", 1:6), c("seqID", "dbID", "matchPerc",  : 
  Items of 'old' not found in column names: [V1, V2, V3, V4, V5, V6]. Consider skip_absent=TRUE.
In addition: Warning message:
In fread(paste0(repSeqDir, file_prefix, "vsearch_results.txt"),  :
  File '~/projects/Stephens/OTU_EMBL/data//embl_gems/seqtempAKTIX09772vsearch_results.txt' has size 0. Returning a NULL data.table.

cnoecker commented 2 years ago

Hi Hena, I think you might have downloaded the incorrect reference data archive. If your microbiome data is in the form of ASV sequences (and therefore you want to run vsearch to map them to the embl_gems models), you need to download the first link here https://borenstein-lab.github.io/MIMOSA2shiny/downloads.html (archive should be named ASV_EMBL.tar.gz), and provide the path to it in your configuration file. That archive should include the UDB sequence database. The "OTU_EMBL" archive does not include the sequence database but instead mapping files from Greengenes and SILVA OTUs to the embl_gems reconstructions.

Thanks for your interest in MIMOSA2! Let me know if that resolves the issue.

Cecilia

ramay commented 2 years ago

Thanks Cecilia! I was making the mistake by using the OTU_EMBL folder. Would mimosa2 work with Humann3 data also like it does with Humann2?

Also for the silva input example https://borenstein-lab.github.io/MIMOSA2shiny/test_silva.txt the link does not work and takes me to a 404 page.

Thanks! Hena

ramay commented 2 years ago

Also for the ko data are RPK values ok or should it be CPM or relative abundances. Thanks Hena

ramay commented 2 years ago

Sorry to bother you again. I managed to run the unstratified KO Humann3 results with Mimosa2 using RPK abundances. But when I tried the using the stratified data, I got the following error:

Error: expected a Humann2-stratified file but ID column is missing and/or column names are formatted differently (no Abundance-RPKs tag). Did you select the correct file format?

When I look at your example the column names do not have _Abundance-RPKs at the end. I had removed them for the unstratified data and it work but here it did not.

So I added them back but I am still getting this error. So my sample names are in this format UCS_052_base_Abundance-RPKs and the KOs column have information like this K00005|g__Escherichia.s__Escherichia_coli Thanks! Hena

cnoecker commented 2 years ago

Hi Hena, Thanks for the messages. Let me try to answer one by one.

-Yes, inputs from Humann3 and Humann2 should both work. Thanks for the note about the example SILVA file, I'll make a note to update that. -MIMOSA2 will normalize stratified tables to relative abundances, but it does not normalize unstratified KO tables by default (to allow for different possible normalizations using tools like MUSICC or MicrobeCensus). So you should normalize your unstratified data, but for stratified it may not matter. -That error message is confusing and we will address it soon - the tool no longer expects the Abundance-RPKs tag that was added to column names in previous versions of Humann. However, it seems that something is incorrect in the format of your stratified file. In particular the column with KO and taxon info (e.g. K00005|g__Escherichia.s__Escherichia_coli) should be named "ID". Feel free to share a few lines of your file if you continue to encounter this error.

Thanks, Cecilia

ramay commented 2 years ago

Thanks Cecilia. changing KO to ID helped and the error disappeared! Hena

borenstein-lab / mimosa2

Missing all_16S_seqs.udb #3