biocore / deblur

Deblur is a greedy deconvolution algorithm based on known read error profiles.
BSD 3-Clause "New" or "Revised" License
92 stars 41 forks source link

Can you recommend some reference databases (Positive mode) for ITS and 18S sequences? #205

Open Listen-Lii opened 3 years ago

Listen-Lii commented 3 years ago

Hi, Default database in Deblur for 16S sequences is Greengene. Can you recommend some reference databases (Positive mode) for ITS and 18S sequences?

amnona commented 3 years ago

Hi, I do not have experience with ITS/18S databases. In general, the final filtering step (for which you are looking for a database) is not mandatory. I would recommend deblurring without this filtering, and then looking at the results and seeing if/what non ITS/18S sequences are dominant and what their identity/origin is. Then you can consider if you want to apply the filtering and asses it's performance.

Amnon

On Fri, Nov 27, 2020 at 6:44 AM Li shuzhen notifications@github.com wrote:

Hi, Default database in Deblur for 16S sequences is Greengene. Can you recommend some reference databases (Positive mode) for ITS and 18S sequences?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biocore/deblur/issues/205, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMQB4VBL664O22UW2KONPDSR4VDPANCNFSM4UEOFBFQ .

Listen-Lii commented 3 years ago

Thank you for your prompt reply! A positive filtering database is required according to the Deblur help documentation. How can I skip this?

amnona commented 3 years ago

A simple way would be to run the deblur workflow without supplying the positive filtering database. This will use the greengenes database (which is not relevant to your data). But then from the deblur output, you can use the "all.biom" table instead of the "reference-hit.biom". The "all.biom" table will contain all deblurred sequences (without the positive filtering step).

Good luck Amnon

On Sun, Nov 29, 2020 at 1:18 PM Li shuzhen notifications@github.com wrote:

Thank you for your prompt reply! A positive filtering database is required according to the Deblur help documentation. How can I skip this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore/deblur/issues/205#issuecomment-735377271, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMQB4WFZQ7HNJRDOBWB6YTSSIUZFANCNFSM4UEOFBFQ .

Listen-Lii commented 3 years ago

Thank you so much! By the way, may I ask you another question? Deblur performs quality control on each sample respectively. After this, is it that all the samples are mixed together to get the ASV table and representative sequence? Or do you get ASV separately for each sample and then merge these tables? Recently I have found that pooling samples or not will have a great impact on ASV numbers. Here is the link: https://github.com/benjjneb/dada2/issues/1194. Do you have any thoughts on this? Thanks again for your kind help.

amnona commented 3 years ago

Hi, deblur works on each sample independently (so it doesn't do any pooling) and then combines the results to a single table. The only (optional) deblur step that takes into account ASV distribution across multiple samples is the final --min-reads flag (which by default is 10) that removes all ASVs with <=min-reads read total over all samples combined (you can disable this step by providing --min-reads 0). Regarding looking at singletons, this can be problematic. Deblur (when working on each sample) first throws away all singletons and ten processes all remaining reads. This is because singletons introduce a large rate of discreteness to the denoising process (which uses the smooth noise distribution) and therefore hard to clean (and contain a very large amount of sequencing artifacts). Therefore, deblur introduces a non-linearity in the singletons, which may be problematic for downstream analysis that specifically uses the amount of singletons. As an alternative, maybe try deblurring the data and then rarifying to 1/2 number of resulting reads/sample (or even 1/3 or lower if you have enough reads?). This way i think can produce singletons after the denoising (and these singletons should be identical to what you would get if you would sample your population (without read errors) to a lower depth). So i think this may be a relatively valid way to overcome the problem of singletons?

Does this help? Amnon

On Sun, Nov 29, 2020 at 2:03 PM Li shuzhen notifications@github.com wrote:

Thank you so much! By the way, may I ask you another question? Deblur performs quality control on each sample respectively. After this, is it that all the samples are mixed together to get the ASV table and representative sequence? Or do you get ASV separately for each sample and then merge these tables? Recently I have found that pooling samples or not will have a great impact on ASV numbers. Here is the link: benjjneb/dada2#1194 https://github.com/benjjneb/dada2/issues/1194. Do you have any thoughts on this? Thanks again for your kind help.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore/deblur/issues/205#issuecomment-735383061, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMQB4VLK7R3OGNEXR4JWTDSSI2BJANCNFSM4UEOFBFQ .

Listen-Lii commented 3 years ago

Thank you for your answer! Compared to singleton, I think unique (i.e. species that occur in only one sample) is more likely to be a problem among different ASV algorithms. I found that some high numbers of ASVs only appear in one sample in DADA2, whilst all samples were parallel replicates. Deblur performed well in my test. So does Deblur do anything specific with this situation?

amnona commented 3 years ago

Deblur treats each sample independently. After deblurring each sample, all deblurred samples are joined to a single biom table, and sequences with < 10 reads total (over all samples) are removed (this is controlled by the --min-reads parameter, which can be set to 0 to disable this step). When deblurring each sample, deblur first throws away all singleton reads from the sample, and then proceeds to the rest of the denoising steps on the sample (removing phiX sequences, denoising, removing chimeras)

does this make sense?

On Thu, Dec 3, 2020 at 5:07 AM Li shuzhen notifications@github.com wrote:

Thank you for your answer! Compared to singleton, I think unique (i.e. species that occur in only one sample) is more likely to be a problem among different ASV algorithms. I found that some high numbers of ASVs only appear in one sample in DADA2, whilst all samples were parallel replicates. Deblur performed well in my test. So does Deblur do anything specific with this situation?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore/deblur/issues/205#issuecomment-737635115, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMQB4QVDSPL4XDKWKENHLTSS36HHANCNFSM4UEOFBFQ .

Listen-Lii commented 3 years ago

Thank you very much for your reply!