BioInf-Wuerzburg / proovread

PacBio hybrid error correction through iterative short read consensus
MIT License
60 stars 20 forks source link

Subreads per SMRT cell or combined #114

Closed faraz89 closed 6 years ago

faraz89 commented 6 years ago

Hi Thomas,

  1. I have 4 subreads.fq files (1 per SMRT cell) and i have used pbsmrtpipe to merge them together into one subreadSet (containing all 4 subreads). Can i do chunking on this subreadset or do i need to do it per SMRT cell separately?

  2. So i have 12X long reads and 775 Mbp genome size. That means the chunking size should be around 1GB. I hope that make sense?

Thanks in advance.

thackl commented 6 years ago
  1. It doesn't matter whether you chunk the individual cells or the merged file
  2. 1 GB sounds good.

And just as a tip, create one tiny chunk (5MB or so first, something like SeqChunker -s 5M -l 1 subreadSet.fq > tiny.fq, and try to run proovread on that as a test to see if it everything works as expected, rather than starting a 1GB chunk, and having to wait a few hours to run into potential problems