Open marwa38 opened 3 months ago
No that does not look like a format that works with assignTaxonomy
. You can see the reference format for assignTaxonomy
described here: https://benjjneb.github.io/dada2/training.html#formatting-custom-databases
Hello team, Thank you so much for your reply. Do you think that this format now will work with assignTaxonomy() described by DADA2? Is there anything that need to be changed?
Thanks in advance
The separators between taxonomic levels need to be semicolons, not underscores.
Hello team,
Thank you for your prompt response.
I think this should be okay now.
Do I also have to use semicolons instead of space between ID and Genus species in assignSpecies as well or the space in between is fine? Kindly assist.
Many thanks
The addSpecies
format uses spaces, not semicolons.
You should probably clean up the double >>
symbol at the start of your fasta ID lines to >
. And I don't remember if the line-ending semicolon is required for the assignTaxonomy
format, but given that's in the official description of the format I'd probably add it in.
Thank you so much, this is very helpful :)
Hi again
Can't we add Level 7 in the assignTaxonomy? i.e. species and leave semicolons afterwards too? instead of using assignSpecies?
Like this
What is meant by the ID in the assignSpecies()
format? is that the accession number? is the ID optional or mandatory to be added? like below (we just added Genus and species name after adding Leptospira considering that is the ID?)
Thanks again. Marwa
Can't we add Level 7 in the assignTaxonomy? i.e. species and leave semicolons afterwards too? instead of using assignSpecies?
Yes.
What is meant by the ID in the assignSpecies() format? is that the accession number? is the ID optional or mandatory to be added? like below (we just added Genus and species name after adding Leptospira considering that is the ID?)
Yes it is usually something like an accession number. It is "mandatory" in the sense that it has to be included in the formatted ID line, but there isn't a requirement that it is real. So your workaround of just putting in "Leptospira" in the ID position is fine.
Hi @benjjneb , I am following up on your previous advice regarding DADA2. I used the FASTA file containing my amplicon sequences, formatted according to the assignTaxonomy() function recommendations. I then used the nf-core/ampliseq pipeline to identify Leptospira at the genus level. However, I encountered an issue with repeated Taxa IDs assigned to different Leptospira species, please see attached. I'd now like to use the DADA2 pipeline (https://benjjneb.github.io/dada2/tutorial.html) specifically for denoising, merging, and chimera control steps to improve the data quality before subsequent analyses. Unfortunately, I lack experience with the necessary sequence preparation steps required by the pipeline. Including, demultiplexing, adapter trimming and removing non-biological adapter sequences from the reads.
Could you please advise me on the necessary tools and steps to prepare my current FASTA file for processing with the DADA2 pipeline and achieve the desired denoising, merging, and chimera control steps?
Thank you in advance. Khadija ASV_taxa_species.csv
However, I encountered an issue with repeated Taxa IDs assigned to different Leptospira species, please see attached.
I'm not sure what this means.
I'd now like to use the DADA2 pipeline (https://benjjneb.github.io/dada2/tutorial.html) specifically for denoising, merging, and chimera control steps to improve the data quality before subsequent analyses. Unfortunately, I lack experience with the necessary sequence preparation steps required by the pipeline. Including, demultiplexing, adapter trimming and removing non-biological adapter sequences from the reads.
Could you please advise me on the necessary tools and steps to prepare my current FASTA file for processing with the DADA2 pipeline and achieve the desired denoising, merging, and chimera control steps?
The DADA2 tutorial that you linked is the place to start for understanding how to use DADA2 on your sequencing data.
DADA2 is not intended for use with fasta data, but rather with the fastq data (that also has quality scores) that you get from amplicon sequencing measurements.
Hi team
Could you please let me know if this fasta file (screenshot below) work fine for dada2 pipeline to be used instead of silva 1.38? DO you think that is compatible with
assignTaxonomy()
. This fasta file was created by the lab for the Leptospira species known (currently 69 known species).We are not doing 16S microbiota but secY gene were some info are shared below Target organism: Leptospira spp. (currently 69 known species) Amplicon Sequencing method: AmpSeq
We were adviced to run dada2 by a previous collague who ran it using this specific aforementioned gene and sequencing.
Thanks in advance Marwa