Parallelism setup: avoid passing the full arrays to each process

apasto / decorrelate-grids

Given two 2D arrays (grids), perform a windowed linear regression between the two. Aim: removal of correlated component between two variables.

Apache License 2.0

8 stars 2 forks source link

Updating on this with some ideas (disregarding that almost 2 years passed since the ones above :sweat_smile: )

Let's consider this (and the serial just above this, where we could also apply the same):

https://github.com/apasto/decorrelate-grids/blob/c41f8bab7a380c86943a417559a02b374e0cb8d3/decorrelategrids/rollinglinreg.py#L190-L204

(for the record: we may get rid of pool.apply and kwds, see #5 )

We are passing 'a': A.to_numpy(), 'b': B.to_numpy() to each worker.

Quite trivially, we may slice before, pass afterwards - just the window:

def extract_window(a, e_i, hw_y_i, hw_x_i):
    return a[e_i[0] - hw_y_i: e_i[0] + hw_y_i + 1, e_i[1] - hw_x_i: e_i[1] + hw_x_i + 1]

kwds={
    'a': extract_window(A.to_numpy(), element, window_halfwidth_y_i, window_halfwidth_x_i),
    'b': extract_window(A.to_numpy(), element, window_halfwidth_y_i, window_halfwidth_x_i),
    'e_i': element,
    'hw_x_i': window_halfwidth_x_i,
    'hw_y_i': window_halfwidth_y_i
}

apasto / decorrelate-grids

Parallelism setup: avoid passing the full arrays to each process #3