Question about using Deblur in meta analysis

Hi, I'm working on a meta analysis of ~25 Illumina 18S rRNA amplicon datasets (all from different studies and different anoxic marine enviros) and have a question about when it is appropriate to merge the data.

Since deblur runs a static error model it should be fine to deblur all of the cleaned, trimmed, and merged sequences from all of the studies together. I am running v4 and v9 studies separately and making sure the data that I'm running together is of the same length and region of the gene. So my plan is to pre process the reads outside of qiime2 and then import them all together to speed up and simplify my pipeline.

Is this reasonable? I have seen some examples in the literature of similar meta-analyses using deblur that denoise all of the studies separately and then merge them with merge-seq (e.g. https://www.nature.com/articles/s41396-020-00814-9#MOESM3). I haven't seen examples of what I am planning to do and want to make sure everything I do is justified especially since i am so new to bioinformatics.

Thank you! Edit: sorry, I realize this should be posted on the qiime2 forum

Hi Anna, Deblur treats each sample independently, and therefore it does not matter if you process each dataset separately or together. There is only one final step - filtering of low abundance reads - that depends on all the samples processed. The default (--min-reads=10) removes all sequences that appear in < 10 reads in all samples processed together. However, 10 is an arbitrary number (stemming from the fact that there is not much statistical analysis possible on sequences that appear in a very small total...). So, both options seem fine to me - whatever is more convenient for you.

On Fri, Feb 19, 2021 at 7:04 PM anna-schrecengost notifications@github.com wrote:

Hi, I'm working on a meta analysis of ~25 Illumina 18S rRNA amplicon datasets (all from different studies and different anoxic marine enviros) and have a question about when it is appropriate to merge the data.

Since deblur runs a static error model it should be fine to deblur all of the cleaned, trimmed, and merged sequences from all of the studies together. I am running v4 and v9 studies separately and making sure the data that I'm running together is of the same length and region of the gene. So my plan is to pre process the reads outside of qiime2 and then import them all together to speed up and simplify my pipeline.

Is this reasonable? I have seen some examples in the literature of similar meta-analyses using deblur that denoise all of the studies separately and then merge them with merge-seq (e.g. https://www.nature.com/articles/s41396-020-00814-9#MOESM3). I haven't seen examples of what I am planning to do and want to make sure everything I do is justified especially since i am so new to bioinformatics.

Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biocore/deblur/issues/206, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMQB4TSLYERPFT2P6QIVN3S72K3VANCNFSM4X4WTJ3Q .

biocore / deblur

Question about using Deblur in meta analysis #206