benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
460 stars 142 forks source link

Creating dada2 compliance database for a mock sample #1323

Closed Jeleel2020 closed 3 years ago

Jeleel2020 commented 3 years ago

Hi Guys,

I am analysing a 16s amplicon data containing a mock sample from ZymoBIOMICS. I have the sequence of the 8 genus of bacteria contained in the mock sample from the suppliers. I was able to identify all the 8 bacteria genus when the data were assigned to a silva database. However, when I use the file (zip) from the suppliers, I detected none of the bacteria and suspected that the file needs to be formatted/ammended to meet dada2 requirements. Has anyone experienced this before and how do you resolve it? Thanks in anticipation of your response.

Kind regards, Jeleel.

benjjneb commented 3 years ago

However, when I use the file (zip) from the suppliers, I detected none of the bacteria and suspected that the file needs to be formatted/ammended to meet dada2 requirements.

Can you clarify? What file? What suppliers are supplying it? And what are you doing with said file (provide exact command if you can).

Jeleel2020 commented 3 years ago

Hi,

Thanks for your reply, I am trying to execute this command to evaluate the accuracy of the mock community using this command;

unqs.mock <- seqtab.nochim["Mock",] unqs.mock <- sort(unqs.mock[unqs.mock>0], decreasing=TRUE) # Drop ASVs absent in the Mock cat("DADA2 inferred", length(unqs.mock), "sample sequences present in the Mock community.\n") mock.ref <- getSequences(file.path(path, "HMP_MOCK.v35.fasta")) match.ref <- sum(sapply(names(unqs.mock), function(x) any(grepl(x, mock.ref)))) cat("Of those,", sum(match.ref), "were exact matches to the expected reference sequences.\n")

The name of the supplier is Zymo Research. I have the attached folder from the supplier and would like to have it in form of the "HMP_MOCK.v35.fasta" which is used in dada2 tutorial and which can be found on this website. It is important to mention that the region we targeted may not be able to assign the results up to species level, but we are fine if we can get it up to genus level. ZymoBIOMICS.STD.refseq.v2.zip

Kind regards, Jeleel.

benjjneb commented 3 years ago

The refseq files provided by Zymo are not meant to be used with assignTaxonomy. That said, you ought to be able to match some of your ASVs to the provided database if you are using grepl against a relevant reference fasta file.

That isn't HMP_MOCK.v35.fasta in this case. You would want to concatenate the fastas found in the ssRNAs directory in that provided Zymo zipped directory, and then use that.

Jeleel2020 commented 3 years ago

Hi Benjamin,

Thanks for your quick response. I will concatenate the fasta found in the ssRNAs directory and test it. I will get back to you on the outcome.

Kind regards, Jeleel.

Jeleel2020 commented 3 years ago

Hi Dr. Benjamin,

After I concatenate the file as you suggested, I was able to match the sequence in my mock community to the one provided by the supplier. Thank you so much.

Kind regards, Jeleel.