This may be a tcmalloc bug or "deliberate tradeoff" that tcmalloc makes, or it may indicate something sketchy in DLA-Future. It would be good to do at least some investigation to make sure it's not the latter.
It would still be interesting to know what's going on here, but we've since then changed the default allocator to mimalloc, and we're unlikely to ever get back to this. Closing.
20k by 20k matrices with 128 blocksize on the triangular solver miniapp using tcmalloc shows worrying behaviour:
The first iteration is on par with e.g. mimalloc.
This may be a tcmalloc bug or "deliberate tradeoff" that tcmalloc makes, or it may indicate something sketchy in DLA-Future. It would be good to do at least some investigation to make sure it's not the latter.
Related to #587.