Closed ASLeonard closed 3 years ago
I tried it anyway with with cut ... $illum.dump | paste $hifi.dump - | ...
, so the axes may be flipped from the labels.
This was using the merged hap1 + hap2 fasta file with hifi and short reads, but the short reads had fairly lower coverage (~16x).
There is an approximate R of -0.03
, but the top three values below accounted for ~ 61% of all points, and so probably bias that heavily.
3185301570 0.00 0.00
348440924 0.00 -1.00
143715045 0.00 1.00
It is interesting that the two axes are pretty heavily populated, but not the diagonal. I guess this may demonstrates that kmer bias for hifi is pretty independent of kmer bias for short reads?
Hi @ASLeonard , just saw this now. Sorry for the silence!
Yes, as far as I can tell, the k-mer bias was independent, so to speak. The different error modes in HiFi and Illumina seem to be the cause of this; we found homopolymer and microsatellite contraction in HiFi reads and the long-known GC biases in Illumina reads as shown here in T2T-CHM13.
I haven't run merfin yet for the illumina data I have yet, but wasn't entirely clear on the usage of the cartesian plot scripts. The input for cartesian_plot.R is the output of simplify_dump.sh, and the input for that should be
$1=illumina.dump
and$2=hifi.dump
?