PengNi / basemods_spark

0 stars 0 forks source link

ipdSummary.py is taking a lot of time in repeats of a reference #4

Closed PengNi closed 6 years ago

PengNi commented 6 years ago

after aligned reads to reference, if a chunk of the reference is a repeat region and have a mean coverage more than 10k, then when the maxCoverage in ipdSummary.py is set to 200, 300 or -1, ipdSummary.py will take a lot of time to complete. when maxCoverage is set to 100, the run time is much less than 200, 300, -1.

PengNi commented 6 years ago

the only way to solute it is to select a proper maxCoverage, maybe 250. ref: http://www.pacb.com/wp-content/uploads/2015/09/WP_Detecting_DNA_Base_Modifications_Using_SMRT_Sequencing.pdf