I ran into an invald PAF error (from convertCoordinates() in cactus_consolidated), where the start coordinate exceeded the sequence length. After some digging, it turns out that the FASTA file passed into cactus_consolidated was missing the last ~75kb or so. This fasta file is created by the cactus-graphmap-split phase. It's checkedpointed to disk with toil.exportFile and then read back with toil.importFile before going into consolidated. The version checkedpointed to disk is fine, but the version in the jobstore going into the job itself is short. As unlikely as it seems the only explanation I can think of is that the import somehow happened before the export finished.
So this PR does two things, that may or may not help:
explicitly close the outfile file handle in cactus_call that's passed in to stdout in Popen(). The logic being that leaving this open maybe delays its buffer being flushed. I'm pretty doubtful about this, but it shouldn't hurt.
add a time.sleep() call at the end of each workflow phase (right after file export) in cactus-pangenome. This is super hacky but may be enough to patch over whatever low-level bugs are causing this in the first place. Currently set to 5 seconds after minigraph and map and 10 seconds after split and align.
I ran into an invald PAF error (from
convertCoordinates()
incactus_consolidated
), where the start coordinate exceeded the sequence length. After some digging, it turns out that the FASTA file passed intocactus_consolidated
was missing the last~75kb
or so. This fasta file is created by thecactus-graphmap-split
phase. It's checkedpointed to disk withtoil.exportFile
and then read back withtoil.importFile
before going into consolidated. The version checkedpointed to disk is fine, but the version in the jobstore going into the job itself is short. As unlikely as it seems the only explanation I can think of is that the import somehow happened before the export finished.So this PR does two things, that may or may not help:
cactus_call
that's passed in tostdout
inPopen()
. The logic being that leaving this open maybe delays its buffer being flushed. I'm pretty doubtful about this, but it shouldn't hurt.time.sleep()
call at the end of each workflow phase (right after file export) incactus-pangenome
. This is super hacky but may be enough to patch over whatever low-level bugs are causing this in the first place. Currently set to 5 seconds after minigraph and map and 10 seconds after split and align.