Closed jadeaver closed 2 months ago
This error did resolve by updating to version 1.2.0.
I am actually re-opening because the output report is not as expected. The sunburst plot reported expected taxa, however, the taxonomy, abundances and diversity figures are classifying all the sequences as "unknown". Please see both the abundances table and the sunburst plot below.
Hi @jadeaver ,
Sorry for the delay, I'll take a look on it, I suppose than the sunburst and the sankey are working and is the rest of the plots. Could you paste some few lines of the two files for the database: the ref2taxid and the fasta?
Thanks for taking a look into this. Yes, the sunburst and sankey plots are working. The plots/tables under taxonomy, abundances, and alpha diversity are showing all as "unknown".
The first few lines of the ref2taxid are:
FLASV1.1417 895459642
FLASV2.1445 893084087
FLASV3.1527 60446185
The first few entries of the fasta are:
>FLASV1.1417
GATGAACGCTGGCGGCGTGCTTAACACATGCAAGTTGAACGGTCTGCTTAGGTAGACAGTGGCGCACGGGTGAGTAACGC
GTAGGTGACCTATCCTTTAGTGGGGGATAACTCAGGGAAACTTGAGCTAATACCGCATGAGCTTGTGGTTGTTAGAGGGC
CACAAGGAAAGCAGCAATGCGCTGAGGGAGGGGCCTGCGTCCGATTAGCTAGTTGGCAAGGTAACGGCTTACCAAGGCGA
TGATCGGTAGCTGGTCTGAGAGGACGATCAGCCACATTGGCACTGAGACACGGGCCAAACTCCTACGGGAGGCAGCAGTG
AGGAATATTGGGCAATGGCCGAAAGGCTGACCCAGCAACGCCGCGTGGAGGACGAAGGCTTTCGGGTTGTAAACTCCTTT
TCCGGGGGACGAGGAAGGACGGTACCCTGGGAATAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAAAACGTAGGTG
GCGAGCGTTATCCGGATTTACTGGGCGTAAAGAGCGCGTAGGTGGTTGAGTAAGTTGGATGTAAAATCTCTTGGCTTAAC
TGGGAGGAGACGTTCAAGACTGCTTGGCTTGAGGGCGAGAGAGGGGTGCAGAATTCCCGGTGTAGTGGTGGAATGCGTAG
ATATCGGGAGGAATACCAGTGGCGAAAGCGGCGCCCTGGCTCGCAACTGACACTGAGGCGCGAAAGCGTGGGTAGCGAAC
GGGATTAGATACCCCGGTAGTCCACGCTGTAAACGATGTGAACTGGGTGTTGGCGGTATGAATTCCGTCGGTGCCGTAGC
AAACGCGATAAGTTCACCGCCTGGGGAGTACGGTCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCAG
CGGAGCGTGTGGTTTAATTCGATGCAACGCGAAAAACCTTACCTGGGTTTGACATGGGCGTAGTAGTGAACCGAAAGGGG
AACGAGCCTTCGGGCAGCGTCCACAGGTGCTGCATGGCTGTCGTCAGCTCGTGCCGTGAGGTGTTGGGTTAAGTCCCGCA
ACGAGCGCAACCCCTGTTGCCAGTTATAAGTGTCTGGCGAGACTGCCGGTATCAAGCCGGAGGAAGGTGGGGATGACGTC
AAGTCAGCATGGCCTTTATATCCAGGGCTACACACACGCTACAATGGTCGGTACAGAGGGTTGCAAAGCCGCGAGGTAGA
GCTAATCTCACAAAGCCGGCCTCAGTTCAGATTGGAGGCTGCAACTCGCCTCCATGAAGTCGGAGTTGCTAGTAATCGCC
GGTCAGCAATACGGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACGTCATGGGAGCTGGTAACACCTGAA
GTCGGTGAGCTAACCGCGAGGAGGCAGCCGCCGAGGGTGGGACTAGTGACTGGGACG
>FLASV2.1445
GACGAACGCTGGCGGCATGCCTAATACATGCAAGTCGAACGCGACCAGCCGGTGCTTGCACTGGCGAAGTCGAGTGGCGA
ACGGGTGAGTAACACGTGAGAAACCTACCCTGGAGTGGGGAATAACTCGAAGAAATTCGAGCTAATACCGCATACCTTCT
TACCGTCGAATGGTGGTTTGAAGAAAGATTTATCGCTCTGGGAGGGTCTCGCGGCCTATCAGCTAGTTGGTGAGGTAACG
GCTCACCAAGGCATCGACGGGTAGCTGGTCTGAGAGGACGATCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTAC
GGGAGGCAGCAGTAGGGAATCTTGCGCAATGGGCGAAAGCCTGACGCAGCAATGCCGCGTGCGGGACGAAGGCCCTAGGG
TCGTAAACCGCTTTCAGTAGGGACGAAAATGACGGTACCTGCAGAAGAAGCTCCGGCCAACTACGTGCCAGCAGCCGCGG
TGATACGTAGGGAGCAAGCGTTGTCCGGAATTACTGGGCGTAAAGGGCTCGTAGGTGGTTGAGTAAGTCAGATGTGAAAT
CTCAGGGCCCAACCCTGAGCGTGCATTTGATACTGCTCTGACTAGAGTCCGGTAGGGGAGTGCGGAATTCCTGGTGTAGC
GGTGAAATGCGCAGATATCAGGAGGAACACCGACAGCGAAGGCAGCACTCTGGGCCGGTACTGACACTGAGGAGCGAAAG
CATGGGTAGCAAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGTTGGGCACTAGGTGTGGGGAGAACTCAACTC
TCTCCGCGCCGTAGCTAACGCATTAAGTGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGG
GGCCCGCACAAGCGGCGGAGCATGTTGCTTAATTCGAGGCAACGCGAAGAACCTTACCTGGGTTGAACTACGTGGGAAAA
GCCGCAGAGATGCGGTGTCCTTCGGGGTCCACGATAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGT
TAAGTCCCGCAACGAGCGCAACCCTTGTCCTATGTTGCCAGCGGGTAAAGCCGGGGACTCGTAGGAGACTGCCGGGGTCA
ACTCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTGCAAACATGCTACAATGGCCGGTAC
AACGGGCAGCTAAACCGCGAGGTCAAGCGAATCCCACAAAGCCGGTCTCAGTTCGGATTGAAGTCTGCAACTCGACTTCA
TGAAGCTGGAGTCGCTAGTAATCCCGGATCAGCAACGCCGGGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCA
CACGCCGAAAGTCGGCAACACCCGAAGTCAGTGGCCCAACCCCTAGGGGAGGGAGCTGCCGAAGGTGGGGCTGGCGATTG
GGGTG
>FLASV3.1527
CTTCGACGGAGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCATGCCTAATACATGCAAGTCGAACGCGGCCATCCG
GTGCTTGCACTGGTGAAGCCGAGTGGCGAACGGGTGAGTAACACGTGAGAAACCTGCCCTGGAGTGGGGAATAACTCGAA
GAAATTCGAGCTAATACCGCATACCTTCTCTTCACCGCATGGTGAGTTGAAGAAAGATTTATCGCTCTAGGAGGGTCTCG
CGGCCTATCAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGACGGGTAGCTGGTCTGAGAGGACGATCAGCCACAC
TGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTGCGCAATGGGCGAAAGCCTGACGCAGCA
ATGCCGCGTGCGGGACGAAGGCCCTAGGGTCGTAAACCGCTTTCAGTAGGGACGAAAATGACGGTACCTGCAGAAGAAGC
TCCGGCCAACTACGTGCCAGCAGCCGCGGTGATACGTAGGGAGCAAGCGTTGTCCGGAATTACTGGGCGTAAAGGGCTCG
TAGGTGGTTGAGTAAGTCAGATGTGAAATCTCAGGGCCCAACCCTGAGCCTGCATTTGATACTGCTCTGACTAGAGTCCG
GTAGGGGAGTGCGGAACTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAAGAACACCGACAGCGAAGGCAGCACTCT
GGGCCGGTACTGACACTGAGGAGCGAAAGCATGGGTAGCAAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGTT
GGGCACTAGGTGTGGGGAGAACTCAACTCTCTCCGCGCCGTAGCTAACGCATTAAGTGCCCCGCCTGGGGAGTACGGCCG
CAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGTTGCTTAATTCGAGGCAACGCGAAGAA
CCTTACCTGGGTTGAACTACGTGGGAAAAGCCGCAGAGATGCGGTGTCCTTCGGGGTCCACGATAGGTGGTGCATGGCTG
TCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTATGTTGCCAGCGGGTAAAGC
CGGGGACTCGTAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGCCCCTTATGTCCAG
GGCTGCAAACATGCTACAATGGCCGGTACAAAGGGCAGCTAAACCGCGAGGTCAAGCGAATCCCAAAAAGCCGGTCTCAG
TTCGGATTGAAGTCTGCAACTCGACTTCATGAAGCTGGAGTCGCTAGTAATCCCGGATCAGCAACGCCGGGGTGAATACG
TTCCCGGGCCTTGTACACACCGCCCGTCACACGCCGAAAGTCGATAACACCCGAAGTCAGTGGCCCAACCCTTTAGGGAG
GGAGCTGCCGAAGGTGGGATTGGCGATTGGGGTGAAGTCGTAACAAGGTAGCCGTACCGGAAGGTGCGGCTGGATCACCT
CCTTTCT
I have a detailed document with the steps I took to create the custom database as well if that would be helpful.
Thank you very much! Where these taxids come from (FLASV1.1417 taxid: 895459642)? Are they from ncbi?
No, they are from the MiDAS database (https://www.midasfieldguide.org/guide/downloads), which is a curated 16S reference database for wastewater microbiomes. I downloaded the Qiime.fa and QIIME.txt files. I reformated the Qiime.txt file to include column headers id, kingdom, phylum, etc. and used taxonkit to create the taxdump files. I used the Qiime.fa file to make the minimap database. I used the taxid.map that was created with the taxdump files as the ref2taxid file.
Hi! I've been investigating this issue, for me it looks normal (although I'm using a different dataset). When I observe the same problem than you (all the sequences unknown in the table but the sunburst with data) is when I don't remove the ';' from the taxonomy names. Please, could you try a small check to know if something else is happening?
csvtk space2tab QIIME.txt_MiDAS_5.3.txt > QIIME.txt_MiDAS_5.3.tsv # change spaces files to tabs
sed -i -r 's/[;]+//g' QIIME.txt_MiDAS_5.3.tsv # remove ';' from the end of names
sed -i '1i id\tsuperkingdom\tphylum\tclass\torder\tfamily\tgenus\tspecies' QIIME.txt_MiDAS_5.3.tsv # adding the headers
taxonkit create-taxdump -A 1 QIIME.txt_MiDAS_5.3.tsv -O MiDAS_5.3.taxdump
And then running the wf with:
--reference ~/databases/MIDAS/QIIME.fa_MiDAS_5.3.fa --ref2taxid ~/databases/MIDAS/MiDAS_5.3.taxdump/taxid.map --taxonomy ~/databases/MIDAS/MiDAS_5.3.taxdump/
And check if the results make more sense to you?
Thank you very much in advance
Thank you for looking into this and for these suggestions! I remade the custom database using your suggested commands (modified slightly because I am using a Mac OS not Linux). The workflow ran successfully and output the abundance tables in addition to the other figures. Whoo!
A note in case anyone else runs into this issue - I don't think it was the ; in my case. I had removed them during my initial attempt at creating my custom database. My best guess is that something went wrong when I converted the txt file to a tsv file and the formatting was off.
Thank you again for your time @nggvs!
Operating System
macOS
Other Linux
No response
Workflow Version
v1.1.2
Workflow Execution
EPI2ME Desktop (Local)
Other workflow execution
No response
EPI2ME Version
v5.1.14
CLI command run
No response
Workflow Execution - CLI Execution Profile
None
What happened?
I created a custom database following the documentation provided in this tutorial. I successfully created the taxdump files, minimap2 database, and ref2taxid file. The wf-16s pipeline runs as expected until the makeReport step where I encounter the error "Process minimap_pipeline:makeReport (1) terminated with an error exit status (1)" with "KeyError: "The following 'id_vars' are not present in the DataFrame: ['species']" (please see nextflow log attached). I do get the abundances table output, so perhaps I don't really need the full report. However, my question is what may be causing this error and is there a file I might need to fix in my custom database to be able to get the full report output?
nextflow.log
The first few lines of my output abundance file are below.
Relevant log output
Application activity log entry
Were you able to successfully run the latest version of the workflow with the demo data?
other (please describe below)
Other demo data information