Closed goodguynickpt closed 3 months ago
It looks to me like your fastqFs
and fastqRs
are identical? What is the output of identical(fastqFs, fastqRs)
?
And what do the your filtered filenames look like? outF
Hi! Sorry for the late reply, I was doing some field work! Here is the output!
identical(fastqFs, fastqRs) [1] TRUE outF [1] "C:/Users/Lucas/OneDrive - Universidade de Lisboa/Desktop/BURSA_tissues_16S_microbioma_infravec/ZIP Rbursa INFRAVEC/220620-Infravec2-8115/1/Raw_Data/filtered/filtered_f.fastq.gz"
so, initially, I had these two folders - one with raw data and with FASTQC.
these are in the Fastqc folder
the raw folder used to have those zipped files and not the "filtered" folder you can see here (I created it in R trying to process the raw data)
Inside the filtered folder, you can see that not all files are present - they should be in pairs, from 1 to 12. Initially, I only had 10, 11 and 12 and I managed, little by little, to get more filtered files.
In you R code when you are defining the fastq files you want to filter and trim, and the filtered filenames you want to give them, you aren't disciminating between the forward and reverse files.
fastqFs <- list.files(pathF, pattern="fastq.gz", full.names = TRUE)
fastqRs <- list.files(pathR, pattern="fastq.gz", full.names = TRUE)
Those command yield identical outputs, because pathF
and pathR
are the same directory.
Then when you define the filtpath
you seem to be just creating a single filename for everything (I'm not even sure how you created multiple output files). Finally, when you run filterAndTrim
you are filtering forward and reverse files independently, when that isn't how it works -- you need to filter them together.
I'd recommend going back to the dada2 tutorial with a couple additional bits about filterAndTrim
: fastqFs needs to be a vector of the forward (R1) filenames. fastqRs needs to be a vector of the reverse (R2) filenames (in matched order). filtFs and filtRs need to be unique vectors of filenames where the filtered forward/reverse files will be stored. And filterAndTrim
should be run just once, on both forward and reverse files together, not separately on each.
Thank you! I will go back to the tutorial and try to fix it!
Hey, everyone. Hope everyone is doing ok. I am a masters student and I am writing my master thesis on tick microbiome. While I am collecting ticks and doing some lab/field work, I am also using data sets from a PhD student to analyze a few things to add some extra content to my thesis.
Anyways, I've worked with R in the past, for around 6 months but I am a total noob nowadays.
I am basically self-taught so I struggle with a myriad of simple things. I usually manage to solve most of them after a few hours but my R script is now giving me an error message that got me stumped.
I am trying to apply the code from this tutorial (https://benjjneb.github.io/dada2/tutorial.html) to my dataset and get a few graphs out of it.
I am currently trying to run this code:
Step 0: Install Rtools (If you haven't already)
Download from https://cran.rstudio.com/bin/windows/Rtools/ and install
Step 1: Install the 'dada2' Package
install.packages("dada2") library(dada2)
Step 2: File Paths (Adjusted according to your locations)
File Paths
pathF <- "C:/Users/Lucas/OneDrive - Universidade de Lisboa/Desktop/BURSA_tissues_16S_microbioma_infravec/ZIP Rbursa INFRAVEC/220620-Infravec2-8115/1/Raw_Data" pathR <- pathF # Same as pathF since they are in the same directory filtpath <- "C:/Users/Lucas/OneDrive - Universidade de Lisboa/Desktop/BURSA_tissues_16S_microbioma_infravec/ZIP Rbursa INFRAVEC/220620-Infravec2-8115/1/Raw_Data/filtered"
Step 3: Load Sample File Paths
fastqFs <- list.files(pathF, pattern="fastq.gz", full.names = TRUE) fastqRs <- list.files(pathR, pattern="fastq.gz", full.names = TRUE)
Step 4: Inspect Read Quality
plotQualityProfile(fastqFs[1:2]) plotQualityProfile(fastqRs[1:2])
File Paths
pathF <- "C:/Users/Lucas/OneDrive - Universidade de Lisboa/Desktop/BURSA_tissues_16S_microbioma_infravec/ZIP Rbursa INFRAVEC/220620-Infravec2-8115/1/Raw_Data" pathR <- pathF # Same as pathF since they are in the same directory filtpath <- "C:/Users/Lucas/OneDrive - Universidade de Lisboa/Desktop/BURSA_tissues_16S_microbioma_infravec/ZIP Rbursa INFRAVEC/220620-Infravec2-8115/1/Raw_Data/filtered"
Adjusted filterAndTrim command
outF <- file.path(filtpath, "filtered_f.fastq.gz") outR <- file.path(filtpath, "filtered_r.fastq.gz") out <- filterAndTrim(fwd = fastqFs[1], filt = outF, rev = fastqRs[1], multithread = TRUE)
Step 6: Error Learning
errF <- learnErrors(out[[1]], multithread = TRUE) errR <- learnErrors(out[[2]], multithread = TRUE)
Step 7: Sample Inference
dadaFs <- dada(out[[1]], err = errF, multithread = TRUE) dadaRs <- dada(out[[2]], err = errR, multithread = TRUE)
Step 8: Merging
mergers <- mergePairs(dadaFs, filtpath, dadaRs, filtpath)
Step 9: Construct Sequence Table
seqtab <- makeSequenceTable(mergers)
Step 10: Remove Chimeras
seqtab.nochim <- removeBimeraDenovo(seqtab)
Unfortunately, I always get an error in the filter and trim command. Console output:
https://cran.rstudio.com/bin/windows/Rtools/ Warning in install.packages : package ‘dada2’ is in use and will not be installed
The filter and trim error changes randomly between the "output files for the reverse reads are required" above and
I've tried pretty much everything I can think of and I've been stuck on this for weeks.
I was trying to run some code that was basically a copy-paste version of the tutorial I pasted above but it was not running either.
I would truly appreciate some help! Thank you. This is the only plot I can get from running the whole thing, before it comes crashing down.