Open jadeaver opened 1 month ago
This error did resolve by updating to version 1.2.0.
I am actually re-opening because the output report is not as expected. The sunburst plot reported expected taxa, however, the taxonomy, abundances and diversity figures are classifying all the sequences as "unknown". Please see both the abundances table and the sunburst plot below.
Hi @jadeaver ,
Sorry for the delay, I'll take a look on it, I suppose than the sunburst and the sankey are working and is the rest of the plots. Could you paste some few lines of the two files for the database: the ref2taxid and the fasta?
Thanks for taking a look into this. Yes, the sunburst and sankey plots are working. The plots/tables under taxonomy, abundances, and alpha diversity are showing all as "unknown".
The first few lines of the ref2taxid are:
FLASV1.1417 895459642
FLASV2.1445 893084087
FLASV3.1527 60446185
The first few entries of the fasta are:
>FLASV1.1417
GATGAACGCTGGCGGCGTGCTTAACACATGCAAGTTGAACGGTCTGCTTAGGTAGACAGTGGCGCACGGGTGAGTAACGC
GTAGGTGACCTATCCTTTAGTGGGGGATAACTCAGGGAAACTTGAGCTAATACCGCATGAGCTTGTGGTTGTTAGAGGGC
CACAAGGAAAGCAGCAATGCGCTGAGGGAGGGGCCTGCGTCCGATTAGCTAGTTGGCAAGGTAACGGCTTACCAAGGCGA
TGATCGGTAGCTGGTCTGAGAGGACGATCAGCCACATTGGCACTGAGACACGGGCCAAACTCCTACGGGAGGCAGCAGTG
AGGAATATTGGGCAATGGCCGAAAGGCTGACCCAGCAACGCCGCGTGGAGGACGAAGGCTTTCGGGTTGTAAACTCCTTT
TCCGGGGGACGAGGAAGGACGGTACCCTGGGAATAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAAAACGTAGGTG
GCGAGCGTTATCCGGATTTACTGGGCGTAAAGAGCGCGTAGGTGGTTGAGTAAGTTGGATGTAAAATCTCTTGGCTTAAC
TGGGAGGAGACGTTCAAGACTGCTTGGCTTGAGGGCGAGAGAGGGGTGCAGAATTCCCGGTGTAGTGGTGGAATGCGTAG
ATATCGGGAGGAATACCAGTGGCGAAAGCGGCGCCCTGGCTCGCAACTGACACTGAGGCGCGAAAGCGTGGGTAGCGAAC
GGGATTAGATACCCCGGTAGTCCACGCTGTAAACGATGTGAACTGGGTGTTGGCGGTATGAATTCCGTCGGTGCCGTAGC
AAACGCGATAAGTTCACCGCCTGGGGAGTACGGTCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCAG
CGGAGCGTGTGGTTTAATTCGATGCAACGCGAAAAACCTTACCTGGGTTTGACATGGGCGTAGTAGTGAACCGAAAGGGG
AACGAGCCTTCGGGCAGCGTCCACAGGTGCTGCATGGCTGTCGTCAGCTCGTGCCGTGAGGTGTTGGGTTAAGTCCCGCA
ACGAGCGCAACCCCTGTTGCCAGTTATAAGTGTCTGGCGAGACTGCCGGTATCAAGCCGGAGGAAGGTGGGGATGACGTC
AAGTCAGCATGGCCTTTATATCCAGGGCTACACACACGCTACAATGGTCGGTACAGAGGGTTGCAAAGCCGCGAGGTAGA
GCTAATCTCACAAAGCCGGCCTCAGTTCAGATTGGAGGCTGCAACTCGCCTCCATGAAGTCGGAGTTGCTAGTAATCGCC
GGTCAGCAATACGGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACGTCATGGGAGCTGGTAACACCTGAA
GTCGGTGAGCTAACCGCGAGGAGGCAGCCGCCGAGGGTGGGACTAGTGACTGGGACG
>FLASV2.1445
GACGAACGCTGGCGGCATGCCTAATACATGCAAGTCGAACGCGACCAGCCGGTGCTTGCACTGGCGAAGTCGAGTGGCGA
ACGGGTGAGTAACACGTGAGAAACCTACCCTGGAGTGGGGAATAACTCGAAGAAATTCGAGCTAATACCGCATACCTTCT
TACCGTCGAATGGTGGTTTGAAGAAAGATTTATCGCTCTGGGAGGGTCTCGCGGCCTATCAGCTAGTTGGTGAGGTAACG
GCTCACCAAGGCATCGACGGGTAGCTGGTCTGAGAGGACGATCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTAC
GGGAGGCAGCAGTAGGGAATCTTGCGCAATGGGCGAAAGCCTGACGCAGCAATGCCGCGTGCGGGACGAAGGCCCTAGGG
TCGTAAACCGCTTTCAGTAGGGACGAAAATGACGGTACCTGCAGAAGAAGCTCCGGCCAACTACGTGCCAGCAGCCGCGG
TGATACGTAGGGAGCAAGCGTTGTCCGGAATTACTGGGCGTAAAGGGCTCGTAGGTGGTTGAGTAAGTCAGATGTGAAAT
CTCAGGGCCCAACCCTGAGCGTGCATTTGATACTGCTCTGACTAGAGTCCGGTAGGGGAGTGCGGAATTCCTGGTGTAGC
GGTGAAATGCGCAGATATCAGGAGGAACACCGACAGCGAAGGCAGCACTCTGGGCCGGTACTGACACTGAGGAGCGAAAG
CATGGGTAGCAAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGTTGGGCACTAGGTGTGGGGAGAACTCAACTC
TCTCCGCGCCGTAGCTAACGCATTAAGTGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGG
GGCCCGCACAAGCGGCGGAGCATGTTGCTTAATTCGAGGCAACGCGAAGAACCTTACCTGGGTTGAACTACGTGGGAAAA
GCCGCAGAGATGCGGTGTCCTTCGGGGTCCACGATAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGT
TAAGTCCCGCAACGAGCGCAACCCTTGTCCTATGTTGCCAGCGGGTAAAGCCGGGGACTCGTAGGAGACTGCCGGGGTCA
ACTCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTGCAAACATGCTACAATGGCCGGTAC
AACGGGCAGCTAAACCGCGAGGTCAAGCGAATCCCACAAAGCCGGTCTCAGTTCGGATTGAAGTCTGCAACTCGACTTCA
TGAAGCTGGAGTCGCTAGTAATCCCGGATCAGCAACGCCGGGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCA
CACGCCGAAAGTCGGCAACACCCGAAGTCAGTGGCCCAACCCCTAGGGGAGGGAGCTGCCGAAGGTGGGGCTGGCGATTG
GGGTG
>FLASV3.1527
CTTCGACGGAGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCATGCCTAATACATGCAAGTCGAACGCGGCCATCCG
GTGCTTGCACTGGTGAAGCCGAGTGGCGAACGGGTGAGTAACACGTGAGAAACCTGCCCTGGAGTGGGGAATAACTCGAA
GAAATTCGAGCTAATACCGCATACCTTCTCTTCACCGCATGGTGAGTTGAAGAAAGATTTATCGCTCTAGGAGGGTCTCG
CGGCCTATCAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCATCGACGGGTAGCTGGTCTGAGAGGACGATCAGCCACAC
TGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTGCGCAATGGGCGAAAGCCTGACGCAGCA
ATGCCGCGTGCGGGACGAAGGCCCTAGGGTCGTAAACCGCTTTCAGTAGGGACGAAAATGACGGTACCTGCAGAAGAAGC
TCCGGCCAACTACGTGCCAGCAGCCGCGGTGATACGTAGGGAGCAAGCGTTGTCCGGAATTACTGGGCGTAAAGGGCTCG
TAGGTGGTTGAGTAAGTCAGATGTGAAATCTCAGGGCCCAACCCTGAGCCTGCATTTGATACTGCTCTGACTAGAGTCCG
GTAGGGGAGTGCGGAACTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAAGAACACCGACAGCGAAGGCAGCACTCT
GGGCCGGTACTGACACTGAGGAGCGAAAGCATGGGTAGCAAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGTT
GGGCACTAGGTGTGGGGAGAACTCAACTCTCTCCGCGCCGTAGCTAACGCATTAAGTGCCCCGCCTGGGGAGTACGGCCG
CAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGTTGCTTAATTCGAGGCAACGCGAAGAA
CCTTACCTGGGTTGAACTACGTGGGAAAAGCCGCAGAGATGCGGTGTCCTTCGGGGTCCACGATAGGTGGTGCATGGCTG
TCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTATGTTGCCAGCGGGTAAAGC
CGGGGACTCGTAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGCCCCTTATGTCCAG
GGCTGCAAACATGCTACAATGGCCGGTACAAAGGGCAGCTAAACCGCGAGGTCAAGCGAATCCCAAAAAGCCGGTCTCAG
TTCGGATTGAAGTCTGCAACTCGACTTCATGAAGCTGGAGTCGCTAGTAATCCCGGATCAGCAACGCCGGGGTGAATACG
TTCCCGGGCCTTGTACACACCGCCCGTCACACGCCGAAAGTCGATAACACCCGAAGTCAGTGGCCCAACCCTTTAGGGAG
GGAGCTGCCGAAGGTGGGATTGGCGATTGGGGTGAAGTCGTAACAAGGTAGCCGTACCGGAAGGTGCGGCTGGATCACCT
CCTTTCT
I have a detailed document with the steps I took to create the custom database as well if that would be helpful.
Thank you very much! Where these taxids come from (FLASV1.1417 taxid: 895459642)? Are they from ncbi?
No, they are from the MiDAS database (https://www.midasfieldguide.org/guide/downloads), which is a curated 16S reference database for wastewater microbiomes. I downloaded the Qiime.fa and QIIME.txt files. I reformated the Qiime.txt file to include column headers id, kingdom, phylum, etc. and used taxonkit to create the taxdump files. I used the Qiime.fa file to make the minimap database. I used the taxid.map that was created with the taxdump files as the ref2taxid file.
Operating System
macOS
Other Linux
No response
Workflow Version
v1.1.2
Workflow Execution
EPI2ME Desktop (Local)
Other workflow execution
No response
EPI2ME Version
v5.1.14
CLI command run
No response
Workflow Execution - CLI Execution Profile
None
What happened?
I created a custom database following the documentation provided in this tutorial. I successfully created the taxdump files, minimap2 database, and ref2taxid file. The wf-16s pipeline runs as expected until the makeReport step where I encounter the error "Process minimap_pipeline:makeReport (1) terminated with an error exit status (1)" with "KeyError: "The following 'id_vars' are not present in the DataFrame: ['species']" (please see nextflow log attached). I do get the abundances table output, so perhaps I don't really need the full report. However, my question is what may be causing this error and is there a file I might need to fix in my custom database to be able to get the full report output?
nextflow.log
The first few lines of my output abundance file are below.
Relevant log output
Application activity log entry
Were you able to successfully run the latest version of the workflow with the demo data?
other (please describe below)
Other demo data information