fanglab / nanodisco

nanodisco: a toolbox for discovering and exploiting multiple types of DNA methylation from individual bacteria and microbiomes using nanopore sequencing.
Other
68 stars 7 forks source link

fast5_pass only or also fast5_fail? #59

Open ecpierce opened 1 year ago

ecpierce commented 1 year ago

Hi!

I am currently re-basecalling my raw fast5 files using the fast5_out option so that I can input them into nanodisco. I am wondering if you recommend basecalling fast5 files from both the fast5_pass and fast5_fail folders that Guppy creates during live base calling, or if I should only use the fast5_pass files? Would you expect that interesting modifications (not necessarily just methylation of specific residues) would lead to reads with lower quality scores on average and therefore maybe fast5_fail is also interesting?

Thanks! Emily

touala commented 1 year ago

Hi Emily,

nanodisco was implemented using unfiltered input fast5 so that it can handle most situations users will face. In practice, I would consider two things for whether using pass only or all data. First, if the coverage is limited with pass only reads then adding the remaining reads could help. Second, if you observe enrichment of fail reads for certains regions of interest. From my experience, I do not expect that using only pass reads can miss motifs considering that methylation motifs have many occurrences across the genome but there might be rare situations that I'm not aware of.

I hope this helps.

Best,

Alan