Changes in coverage before and after Nanodisco preprocessing

Hi @GeorgiaBreckell,

I'm surprised that it doesn't give you similar coverage. I often downsample high coverage datasets and I don't remember seeing > 10% discrepancy. Also, the data is not filtered during the nanodisco preprocessing command, all alignments are conserved. I understand that you used the following command to preform the original mapping? Which bwa version was used (nanodisco v1.0.3 uses 0.7.15)?

bwa mem -t $nb_threads -x ont2d $path_reference_genome $path_fasta

How did you proceed to downsample the dataset? Sometimes I use the fast5_subset commands from the ONT API to generate new set of fast5. Otherwise you can directly random sample the .fasta file output from nanodisco preprocessing and remap the subset of reads. But both should give you similar results. Do you have a way to check if all the expected reads are found after downsampling+preprocessing?

Maybe the discrepancy comes from how the initial coverage is computed? How do you compute it? Is it using only the reads in pass folder for example?

Sorry for the barrage of question, I'm not sure where to look. I suppose you could, by default, add back the 20% margin when downsampling but it could hide an important issue.

Alan

fanglab / nanodisco

Changes in coverage before and after Nanodisco preprocessing #29