benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

Installing dada2 version 1.9.0 #1418

Closed DhebbieF closed 1 month ago

DhebbieF commented 2 years ago

Hi, I am trying to install a newer version of dada2 for my PacBio SMRT link sequences, I realised that I do have dada2 version 1.2.0 and it is not doing well with these sequences, I read through a number of (similar) issues and I saw that you suggested that a researcher upgrade to 1.9.1. (https://github.com/benjjneb/dada2/issues/374) :

""Caveats. You should get version 1.9.1, and it probably won't work well (or at all) for early PacBio data, i.e. data on pre-P6C4 chemistries, and that was processed with old software versions (when it was named SMRT Portal rather than SMRT Link)"".

I tried upgrading this using devtools::install_github("benjjneb/dada2", ref="v1.9.1") # change the ref argument to get other versions but got the following error :

"""Error: Failed to install 'dada2' from GitHub: Timeout was reached: [api.github.com] Resolving timed out after 10000 milliseconds""""

then used: library(devtools) devtools::install_github("benjjneb/dada2", ref="v1.9.1") # change the ref argument to get other versions then it gave another error:

Downloading GitHub repo benjjneb/dada2@v1.9.1 "Error in utils::download.file(url, path, method = method, quiet = quiet, : cannot open URL 'https://api.github.com/repos/benjjneb/dada2/tarball/v1.9.1'"

Can you please guide me with dada2 upgrade please?

I use R version 4.1.1 BioCversion (3.13.1) BioCManager (1.30.16)

Thank you. Deborah

DhebbieF commented 2 years ago

As a follow up please: My version is 1.20.0 not 1.2.0 I think this is pretty updated.

I'm now wondering why I have less than one-tenth of sequences after denoising, the sequences are PacBio Hifi reads.

Thank you.

benjjneb commented 2 years ago

Yes you have an updated version so that isn't the problem.

I'm now wondering why I have less than one-tenth of sequences after denoising, the sequences are PacBio Hifi reads.

There is quite a bit of variation in PacBio HiFi reads, especially given the changes that have happened since the technology was first introduced. Can you describe in more detail the amplicon you are sequencing, especially its length, and the specifics of the HiFi reads, i.e. what chemistry and instrument is being used? And what environment is being sampled?

The next thing I would look at is dereplication stats for a couple of samples. I.e., when you derepFastq("path/to/a_hifi_sample.fastq", verbose=TRUE) how many unique sequences are being found? That number should be significantly less than the total number of reads.

DhebbieF commented 2 years ago

Thank you for your quick response,

Here are the specifics:

Length of amplicon sequenced - 16S full length..histogram generated in dada2 puts all length at ~1500

Details sent by the sequencing centre says instrument used was the Sequel System by PacBio to generate Circular Consensus Sequences (CCS). We were then sent the exact protocol in pdf used in your paper ( High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution) as the protocol used to generate our reads ("'Procedure & Checklist - Full-Length 16S Amplification, SMRTbell® Library Preparation and Sequencing'''') using KAPA HiFi HotStart PCR Kit for amplification.

The environment being sampled is a murky flowing river.

The dereplication stats for 4 of these samples are as follows: drp <- derepFastq(filts, verbose=TRUE)

Encountered 1265 unique sequences from 1409 total sequences read.

Encountered 191 unique sequences from 210 total sequences read.

Encountered 2211 unique sequences from 2313 total sequences read.

Encountered 756 unique sequences from 893 total sequences read.

I used the following parameter for trimming (following your tutorial from this page) : https://benjjneb.github.io/LRASManuscript/LRASms_fecal.html

track <- filterAndTrim(nops2, filts, minQ=3, minLen=1000, maxLen=1600, maxN=0, rm.phix=FALSE, maxEE=2, verbose = TRUE)

The filterAndTrim also slashed the reads into almost 2 equal halves.

Thank you.

benjjneb commented 2 years ago

One more question: Did you do the primer removal and orientation step yourself? If so, were you also following the LRAS tutorial? (e.g. using removePrimers(..., orient=TRUE))?

DhebbieF commented 2 years ago

Yes I did, (just like in the tutorial), I defined my primers as follows: FWD.Pr <- "AGRGTTYGATYMTGGCTCAG" RV.Pr <- "RGYTACCTTGTTACGACTT" rc <- dada2:::rc and then removed them using the line: prim2<- removePrimers(Pac_F, nops2, primer.fwd=FWD.Pr,primer.rev =dada2:::rc(RV.Pr),orient = TRUE,verbose = TRUE)

DhebbieF commented 2 years ago

Hi, Just stopping-by to check for possible follow-up/help/tip/assistance for this issue.

Also, out of curiosity, if primers have been removed from PacBio sequences, is it possible to jump the "remove primers" step? And just do "the filterAndTrim" step?