benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

How to continue with the pipeline, ignoring samples that don't pass the filter #594

Closed Ateuchus closed 5 years ago

Ateuchus commented 5 years ago

Hi,

I have a number of samples that don't pass the filtering- which is absolutely fine (I don't want to try and salvage them)- and so there is no new file written in the filtered output (makes sense). However, when I then go to plot errors, it is obviously looking for those non-existent files. I was just wondering if there was a function to remove these samples from the set easily, or whether I should just go and find these files and remove them myself and start again.

I know, I know.. I'm just being lazy- but If nothing else, I think it would be a nice little addition to the pipeline!

This is the error:

errF <- learnErrors(filtFs, multithread=TRUE) Error in open.connection(con, "rb") : cannot open the connection In addition: Warning message: In open.connection(con, "rb") : cannot open file 'Blahblah/filtered/125_P_B_F_filt.fastq.gz': No such file or directory

(and, just so you can see) The filter removed all reads: blahblah/filtered/125_P_B_F_filt.fastq.gz and blahblah/filtered/125_P_B_R_filt.fastq.gz not written.

Also, on a slightly different note- I'm just testing out the pipeline on some CO1 metabarcoding data. I can't really see why this wouldn't work, but I read someone posted that their results contained a lot of stop codons. Do you know if this an appropriate pipeline to use/ has it been compared to any other method? Had a quick look at the literature and can't find much. Personally I like this program for my 16s/18s sets, so I'm hoping it all works.

Thank you!

benjjneb commented 5 years ago

I was just wondering if there was a function to remove these samples from the set easily, or whether I should just go and find these files and remove them myself and start again.

Yes this is admittedly a little annoying. The way we suggest doing this is to check for the existence of the files after filtering, and then drop those that don't exist.

filterAndTrim(fnFs, filtFs, ...)
not.lost <- file.exists(filtFs) # alternatively, not.lost <- out[,"reads.out"] > 0
filtFs <- filtFs[not.lost]
# possibly also update filtRs and sample.names at this point in the same way

Also, on a slightly different note- I'm just testing out the pipeline on some CO1 metabarcoding data. I can't really see why this wouldn't work, but I read someone posted that their results contained a lot of stop codons. Do you know if this an appropriate pipeline to use/ has it been compared to any other method?

In principle it should work on CO1 data, but admittedly I have not done any testing personally on CO1 data. I'm also not aware of any published validations. I think it should work, but some double-checking is justified (e.g. for stop codons).

Ateuchus commented 5 years ago

Lovely, thank you!

So I've just had a look at what's been generated, and am getting a lot of stop codons as well. A little more searching has revealed that this is probably a common issue with CO1 metabarcoding in general, as a lot of the published sequences also have the same issue. Now trying to figure out what to do next!