Precalculate more things in `Grid` and/or `GridRSP` constructors

rafaqz commented 10 hours ago

This is basically ubiquitous:

_targetidx_and_nodes(grsp.g)

So these may as well be precomputed fields on Grid?

This is also a very common allocation, we could just do once in GridRSP?

        [grsp.g.source_qualities[i] for i in grsp.g.id_to_grid_coordinate_list],
        [grsp.g.target_qualities[i] for i in grsp.g.id_to_grid_coordinate_list ∩ targetidx],

These will be a constant source of medium sized allocations and gc churn when running multiple operations over many tiles.

I'm not totally sure which object Grid/GridRSP these should go on.

andreasnoack commented 9 hours ago

I'm not objecting to anything and I'm not really in a position to do so but my general thinking has been that the more things you store, the higher is the risk that the state becomes invalid, so there should always be somewhat heavy computational reasons to store something that can be easily computed from existing quantities. If something ends up as a bottleneck then this is of course a different matter but I usually think it's useful to demosntrate that something is a bottleneck before optimizing. In my experience, the calculations in this packages are heavily dominated by a few big matrix operations.

rafaqz commented 5 hours ago

That makes sense. But I think these objects are calculated from the target_qualities and id_to_grid_coordinate_list fuelds? I can't see where those would change, let me know if I have missed something.

The main bottleneck currently being hit is too much memory allocation on the cluster where each thread doesn't have much memory. So many jobs are failing but with some stochasticity.

There are multiple angles to reduce this, these allocs are a minor detail of a plan to control and quantify allocations so we can a bit more accurately book memory requirements per thread.

If we can allocate most of what we needed before computations then everything else is easier, and even if we still miscalculate, memory-related failures will happen immediately rather than during a long job.

A bigger problem than this issue is the Matrix calls on sparse rhs matrices for \ in GridRSP. That can likely be reduced by running the solve internals column by column, and possibly by threading over separate solver columns so memory is shared across threads, rather than each thread needing all the allocations of a separate GridRSP.

But overall I'm looking to go over things to control memory use a bit more, and not every step will achieve a lot on its own.

andreasnoack commented 4 hours ago

I can't see where those would change, let me know if I have missed something.

As I remember it, I don't think they can change. Actually, I deliberately tried to write the code such that mutation is avoided to make it easier to reason about the state. However, the more fields you have a direct mapping from one to the other, the higher a risk I think there is that a later change to the code invalidates assumptions elsewhere.

The main bottleneck currently being hit is too much memory allocation on the cluster where each thread doesn't have much memory. So many jobs are failing but with some stochasticity.

Reducing memory pressure makes sense. I just don't think these arrays matter much in the big picture. I agree that possible optimization fo the solves are much more likely to make a difference.

I just checked the old issue tracker where I started this implementation and found

I thought I had put some notes in the tracker about low memory representations of the inverse matrix. When the landscape is a regular grid, it is possible, in theory, to represent the inverse of I - W in O(n) memory but it's numerically difficult/impossible so I never found a way to do it and maybe it will be a waste of time to explore this route further. A keyword is "semi-separable", in case you haven't come along this before.

rafaqz commented 4 hours ago

Yes, these are totally minor details. Mostly just as part of a policy of allocating what we can easily allocate up front.

For the solve, I tried an optimisation that breaks the RHS array into columns and calls the UMFPACK.solve! on each column directly (as / does internally). A pretty good reduction in memory and has some performance gains on the basic example workflow: https://github.com/ConScape/ConScape.jl/pull/23

Maybe it won't work in all contexts if e.g. rectangular matrices don't hit the same function calls,I'm not that far in

(That PR also allows reuse of Z, so we can use the same memory on multiple windows over larger data, so the allocs will be very low after initially allocating the largest required Z. We could do the same with F from lu but resizing is harder)

ConScape / ConScape.jl

Precalculate more things in `Grid` and/or `GridRSP` constructors #28