apasto / decorrelate-grids

Given two 2D arrays (grids), perform a windowed linear regression between the two. Aim: removal of correlated component between two variables.
Apache License 2.0
8 stars 2 forks source link

Parallelism setup: avoid passing the full arrays to each process #3

Open apasto opened 1 year ago

apasto commented 1 year ago

As of 77b9571d1f9126a195535799649d8425cfd81cdb (implementing parallel calls to regression), slicing the A, B arrays to extract the rolling window is done inside wrap_linregress(), which is then called with pool.apply().

I am under the impression that slicing before and passing the contents of slices to each worker would be less wasteful, memory-wise. This could be implemented by adding a 3rd dimension to two A, B-like arrays and assigning a slice to each vector along this new dimension (thanks to @pogmat for his precious advice - hopefully I have got it right).

apasto commented 2 weeks ago

Updating on this with some ideas (disregarding that almost 2 years passed since the ones above :sweat_smile: )

Let's consider this (and the serial just above this, where we could also apply the same):

https://github.com/apasto/decorrelate-grids/blob/c41f8bab7a380c86943a417559a02b374e0cb8d3/decorrelategrids/rollinglinreg.py#L190-L204

(for the record: we may get rid of pool.apply and kwds, see #5 )

We are passing 'a': A.to_numpy(), 'b': B.to_numpy() to each worker.

Quite trivially, we may slice before, pass afterwards - just the window:

def extract_window(a, e_i, hw_y_i, hw_x_i):
    return a[e_i[0] - hw_y_i: e_i[0] + hw_y_i + 1, e_i[1] - hw_x_i: e_i[1] + hw_x_i + 1]
kwds={
    'a': extract_window(A.to_numpy(), element, window_halfwidth_y_i, window_halfwidth_x_i),
    'b': extract_window(A.to_numpy(), element, window_halfwidth_y_i, window_halfwidth_x_i),
    'e_i': element,
    'hw_x_i': window_halfwidth_x_i,
    'hw_y_i': window_halfwidth_y_i
}