benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
460 stars 142 forks source link

How to generate single forward reads dadaFs clustering data.frame output for many samples? #433

Closed galud27 closed 6 years ago

galud27 commented 6 years ago

Hi Benjamin, On a previous issue, I was asking you how to generate the data frame that you get when you do pair-end sequences using:mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE) You mentioned that the dada-class objects themselves have such a data.frame: dadaFs[[1]]$clustering (for sample 1) and so on. I have studies with many samples, and I was wondering if there is a way to join all the dadaFs for all the samples into one data frame.

I'm sorry I was trying to go on a different way and to generate myself the reverse readings and user mergers, but I don't think my sequences look good at all when I look at the quality profiles.

Thank you so much for your help!!

benjjneb commented 6 years ago

You can very easily make the sequence table from the dada-class objects:

st <- makeSequenceTable(dadaFs)

Is that what you want to do? Or do you actually want to "stack" the $clustering data.frames from each sample into one giant data.frame?

galud27 commented 6 years ago

Ben, Yes, I'm trying to stack all the $clustering data.frames of all my samples into one data.frame.

I'm able to do that when I have forward and reverse readings because I can generate the data.frame with all the samples stack together by doing: mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE)

Once I have the data.frame, I write csv files for the abundance, reverse and forward and other info in the data.frame: dir.create('merged') for(name in names(mergers)){ write.csv(mergers[[name]], paste0('merged/', name, '.csv'), quote = F, row.names = F) } What I'm finally hoping to do is to write fasta files ( all the fasta files generated including the unique) and run them in a different pipeline using a phylogenetic placement approach and compare this to other OTU clustering methods.

Let me know if you think this could be possible with all my single forwards reads I have now.

Thank you!!

benjjneb commented 6 years ago

So I think you can get an equivalent output to the above by just looping through the dadaFs objects (which is a list, just like mergers):

for(name in names(dadaFs)){
    write.csv(dadaFs[[name]]$clustering, paste0('forward/', name, '.csv'), quote = F, row.names = F)
}

It won't have the same columns, but some will be the same (including $sequence and $abundance). Does that work?

You can also use the uniquesToFasta function to write out fastas for each sample. Just do the same loop as above, but call uniquesToFasta(dadaFs[[name]], paste0('forward/', name, '.fa') within the loop.

galud27 commented 6 years ago

Yes, looping the dadaFs works and gives me the columns I need!

Just a quick question would the uniquesToFasta output would the same of the dadaFs data.frame output if I generate a fasta file using $sequence and $abundance?

I though that with the merger output I could generate all fasta sequence and the UniqueFasta would only give the most representative unique fasta.

Thank you so much for your help!

benjjneb commented 6 years ago

uniquesToFasta will write a fasta that contains each sequences in the $sequence column, with the $abundance written in the id line of the fasta with size=XXX format that is used by usearch/uchime.

galud27 commented 6 years ago

Ok, great! Thank you.