Closed gadomski closed 8 years ago
Okay, so the problem looks like this:
p_max_total
of 38900626766413400269643493897280213056902576233961077487817475035075152244960, which is ludicrous. The first bug is that this overflows p_max_total
in fgt, which quickly creates chaos. I've opened gadomski/fgt#34 to more-gracefully handle this case upstream.I'll add some API knobs that will make it easier to disable bits and pieces that might be getting in the way, and include them in a PR referencing this ticket to see if we can't make things a bit nicer, at least.
cc @courageon
gadomski/fgt#34 is complete, meaning that we should get better (and quicker) notification if things are going awry under the hood. The second half of this ticket, the cpd part, is still unfinished — I'd like a knob to be able to turn on/off ifgt, so we can easily test to see if direct-tree is usable in this case.
We've got a report in Gitter that affine borks with 50k point datasets w/ about 32G of memory, which seems...worse than I'd expect. Though, re-reading the discussion, these are 100 dimension point sets, so maybe it's not that surprising. Check it out anyways, see if we can reproduce.