ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
526 stars 111 forks source link

Small hacks to maybe avoid potential pangenome race condition #1507

Closed glennhickey closed 3 weeks ago

glennhickey commented 1 month ago

I ran into an invald PAF error (from convertCoordinates() in cactus_consolidated), where the start coordinate exceeded the sequence length. After some digging, it turns out that the FASTA file passed into cactus_consolidated was missing the last ~75kb or so. This fasta file is created by the cactus-graphmap-split phase. It's checkedpointed to disk with toil.exportFile and then read back with toil.importFile before going into consolidated. The version checkedpointed to disk is fine, but the version in the jobstore going into the job itself is short. As unlikely as it seems the only explanation I can think of is that the import somehow happened before the export finished.

So this PR does two things, that may or may not help: