Closed GeorgiaBreckell closed 1 year ago
Hi @GeorgiaBreckell,
I'm surprised that it doesn't give you similar coverage. I often downsample high coverage datasets and I don't remember seeing > 10% discrepancy. Also, the data is not filtered during the nanodisco preprocessing
command, all alignments are conserved. I understand that you used the following command to preform the original mapping? Which bwa
version was used (nanodisco v1.0.3
uses 0.7.15
)?
bwa mem -t $nb_threads -x ont2d $path_reference_genome $path_fasta
How did you proceed to downsample the dataset? Sometimes I use the fast5_subset
commands from the ONT API to generate new set of fast5. Otherwise you can directly random sample the .fasta
file output from nanodisco preprocessing
and remap the subset of reads. But both should give you similar results. Do you have a way to check if all the expected reads are found after downsampling+preprocessing?
Maybe the discrepancy comes from how the initial coverage is computed? How do you compute it? Is it using only the reads in pass
folder for example?
Sorry for the barrage of question, I'm not sure where to look. I suppose you could, by default, add back the 20% margin when downsampling but it could hide an important issue.
Alan
Hi Alan,
We normalized our fast5 coverage across multiple samples to around 120x prior to running Nanodisco, we checked coverage with BWA and used the same commands Nanodisco is running. When we looked at the coverage reported by the preprocessing bam outputs we observed coverage dropped by 20-40X depending on the sample.
I'm assuming some of the fast5 reads are not being converted to fasta, or are being filtered out but aren't sure why or what we can change to avoid this.
Do you have any insight into why this might have occurred and how we can avoid this?
Regards