The job trim_unaligned_sequences currently estimates its memory as a function of its input size.
memory=cactus_clamp_memory(8*sum([seq.size for seq in sequences]) + 32*alignments.size))
This seems to have been working fairly well but @ph09 ran into an issue where it would estimate the minimum value of 2G while needing roughly 11G. It turns out this happens at the root node of 3-human alignment, where the input PAF is 40Mb but there are no input sequences. This explains the minimal estimate. But I guess paffy to_bed uses memory proportional to the ingroup sequences, then it runs out of memory.
This PR changes the logic so that outgroup trimming is turned off when there are no outgroups. From what I can tell looking at the code, it's not doing anything in these cases other than sometimes running out of memory.
But @benedictpaten could you please take a quick look at the file delta here and confirm I'm not breaking anything by turning off outgroup trimming this way? Thanks!
The job
trim_unaligned_sequences
currently estimates its memory as a function of its input size.This seems to have been working fairly well but @ph09 ran into an issue where it would estimate the minimum value of
2G
while needing roughly11G
. It turns out this happens at the root node of 3-human alignment, where the input PAF is40Mb
but there are no input sequences. This explains the minimal estimate. But I guesspaffy to_bed
uses memory proportional to the ingroup sequences, then it runs out of memory.This PR changes the logic so that outgroup trimming is turned off when there are no outgroups. From what I can tell looking at the code, it's not doing anything in these cases other than sometimes running out of memory.
But @benedictpaten could you please take a quick look at the file delta here and confirm I'm not breaking anything by turning off outgroup trimming this way? Thanks!