benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
GNU Lesser General Public License v3.0
460 stars 142 forks source link

How to generate single forward reads dadaFs clustering data.frame output for many samples? #433

Closed galud27 closed 6 years ago

galud27 commented 6 years ago

Hi Benjamin, On a previous issue, I was asking you how to generate the data frame that you get when you do pair-end sequences using:mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE) You mentioned that the dada-class objects themselves have such a data.frame: dadaFs[[1]]$clustering (for sample 1) and so on. I have studies with many samples, and I was wondering if there is a way to join all the dadaFs for all the samples into one data frame.

I'm sorry I was trying to go on a different way and to generate myself the reverse readings and user mergers, but I don't think my sequences look good at all when I look at the quality profiles.

Thank you so much for your help!!

benjjneb commented 6 years ago

You can very easily make the sequence table from the dada-class objects:

st <- makeSequenceTable(dadaFs)

Is that what you want to do? Or do you actually want to "stack" the $clustering data.frames from each sample into one giant data.frame?

galud27 commented 6 years ago

Ben, Yes, I'm trying to stack all the $clustering data.frames of all my samples into one data.frame.

I'm able to do that when I have forward and reverse readings because I can generate the data.frame with all the samples stack together by doing: mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE)

Once I have the data.frame, I write csv files for the abundance, reverse and forward and other info in the data.frame: dir.create('merged') for(name in names(mergers)){ write.csv(mergers[[name]], paste0('merged/', name, '.csv'), quote = F, row.names = F) } What I'm finally hoping to do is to write fasta files ( all the fasta files generated including the unique) and run them in a different pipeline using a phylogenetic placement approach and compare this to other OTU clustering methods.

Let me know if you think this could be possible with all my single forwards reads I have now.

Thank you!!

benjjneb commented 6 years ago

So I think you can get an equivalent output to the above by just looping through the dadaFs objects (which is a list, just like mergers):

for(name in names(dadaFs)){
    write.csv(dadaFs[[name]]$clustering, paste0('forward/', name, '.csv'), quote = F, row.names = F)

It won't have the same columns, but some will be the same (including $sequence and $abundance). Does that work?

You can also use the uniquesToFasta function to write out fastas for each sample. Just do the same loop as above, but call uniquesToFasta(dadaFs[[name]], paste0('forward/', name, '.fa') within the loop.

galud27 commented 6 years ago

Yes, looping the dadaFs works and gives me the columns I need!

Just a quick question would the uniquesToFasta output would the same of the dadaFs data.frame output if I generate a fasta file using $sequence and $abundance?

I though that with the merger output I could generate all fasta sequence and the UniqueFasta would only give the most representative unique fasta.

Thank you so much for your help!

benjjneb commented 6 years ago

uniquesToFasta will write a fasta that contains each sequences in the $sequence column, with the $abundance written in the id line of the fasta with size=XXX format that is used by usearch/uchime.

galud27 commented 6 years ago

Ok, great! Thank you.