benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

minion 16s sequences and derep function from DADA2 #1175

Closed Debora6991 closed 3 years ago

Debora6991 commented 3 years ago

Hi, I'm trying to dereplicate minion sequences that have already been processed for quality, obtaining a fastq.gz file. I would like to dereplicate those sequences mimicking the ASV approach. I tried to use the derepfastq function in DADA2 but it doesn't seem work (R crushes each time). Why is that happening?

this is the code path<-"my_data_path" files <<- file.path(path, "my_folder") derep <<- derepFastq(files, verbose = TRUE)

and this is the size of my minion file (the smaller one) compared to an illumina one, so it is not a problem of file size I suppose image

I thank you in advance for your reply.

benjjneb commented 3 years ago

How long are your sequences? DADA2 has a hard limit at 9999 nts.

Also, DADA2 is not recommended and probably won't really work for Nanopore data at this time, because the error rates are too high. Stuff like filtering is fine, but the denoising and subsequent processing isn't going to work correctly.

Debora6991 commented 3 years ago

I'm aware of that limitations, I was just trying to find a way to dereplicate my sequences (1500 nucleotides). The preprocessing was already done following the nanopore suggested pipeline, I'm not trying to use dada2 to do that. Potentially the dereplication step alone is functional for my purpose, am I wrong?

On Tue, 27 Oct 2020, 14:20 Benjamin Callahan, notifications@github.com wrote:

How long are your sequences? DADA2 has a hard limit at 9999 nts.

Also, DADA2 is not recommended and probably won't really work for Nanopore data at this time, because the error rates are too high. Stuff like filtering is fine, but the denoising and subsequent processing isn't going to work correctly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1175#issuecomment-717238082, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMCFBHZJHOABTZSKUXSIAVTSM3CKPANCNFSM4TAWAK6Q .

benjjneb commented 3 years ago

Potentially the dereplication step alone is functional for my purpose, am I wrong?

Yeah I would think it would work then... not sure why it wouldn't. Can you try to identify a single sample that causes this crash behavior, and then (if you can) share that file with me so I can reproduce the behavior?

Debora6991 commented 3 years ago

I've only tried with a single minion sample, just to keep it simple, that you can find attached. Could it be something with the sequence quality (it's a fastq though, it should be the same for all) [image: image.png]

Il giorno mar 27 ott 2020 alle ore 14:30 Benjamin Callahan < notifications@github.com> ha scritto:

Potentially the dereplication step alone is functional for my purpose, am I wrong?

Yeah I would think it would work then... not sure why it wouldn't. Can you try to identify a single sample that causes this crash behavior, and then (if you can) share that file with me so I can reproduce the behavior?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1175#issuecomment-717244274, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMCFBH2BQFFESGVBYA6LIYLSM3DPXANCNFSM4TAWAK6Q .

benjjneb commented 3 years ago

The file may be too big to attach to a github comment. You can email to me at benjamin DOT j DOT callahan AT gmail DOT com

benjjneb commented 3 years ago

When I run it with the current version of DADA2 (1.17.5) it works, but gives the following warning:

fn <- "~/Desktop/sample.fastq.gz"
drp <- derepFastq(fn)

Warning message: In qtables2(fq) : Zero-length sequences detected during dereplication. They will be ignored.

Gracefully handling zero-length sequences was introduced relatively recently, so you'll need to upgrade to the current devel version (or wait a couple days and it will be released on Bioconductor), or you can filter out the zero-length sequences from your files before dereplicating with an older version of DADA2.

Debora6991 commented 3 years ago

I tried again and obtained the same result:R shutting down. I'm starting to think my computer is the problem here. I'll wait for the newer version of DADA2 to try again. Thank you for your time, you were very helpful anyway.

Debora

Il giorno mar 27 ott 2020 alle ore 15:02 Benjamin Callahan < notifications@github.com> ha scritto:

When I run it with the current version of DADA2 (1.17.5) it works, but gives the following warning:

fn <- "~/Desktop/sample.fastq.gz" drp <- derepFastq(fn)

Warning message: In qtables2(fq) : Zero-length sequences detected during dereplication. They will be ignored.

Gracefully handling zero-length sequences was introduced relatively recently, so you'll need to upgrade to the current devel version (or wait a couple days and it will be released on Bioconductor), or you can filter out the zero-length sequences from your files before dereplicating with an older version of DADA2.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1175#issuecomment-717264962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMCFBHZ3NY7TO7CNFJTYSBLSM3HGHANCNFSM4TAWAK6Q .