Closed glennhickey closed 4 years ago
Slower but not disastrously so (about 30% on chr20).
Command being timed: "hal2vg lc2019_12ont-hg38.cactus.minimap2_star-all-to-ref-fatanc-no-secondary-july-8.hal --progress --inMemory --onlySequenceNames"
Original:
User time (seconds): 158.03
System time (seconds): 2.34
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:40.47
Maximum resident set size (kbytes): 3973656
Fix:
User time (seconds): 201.93
System time (seconds): 2.73
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:25.34
Maximum resident set size (kbytes): 4049860
Original stats:
nodes 5859015
edges 10307039
length 80506059
Fix:
nodes 2912990
edges 4474872
length 76935653
Currently,
hal2vg
only pinches exact matches along branches. This is a problem, particularly for star trees, as homologies between sibling genomes will be missed in the presence of SNPs -- ie if they have a different base than the ancestor.This PR adds a patch to, when a SNP is found wrt the parent, use a column iterator to find exact homologous matches further away in the tree. This should catch all missing homologies, but I'm concerned about speed and memory.
Resolves #26