Parallelization - Githubissues

dennissergeev / octant

Objective Cyclone Tracking ANalysis Tools

http://octant-docs.rtfd.io/

MIT License

6 stars 4 forks source link

Parallelization #1

Open dennissergeev opened 6 years ago

dennissergeev commented 6 years ago

Add an option to execute functions in parallel processes

dennissergeev commented 6 years ago

Proof of concept:

def foo(df):
    flag = True
    if df.total_dist_km < 300.:
        flag = False
    if flag:
        if ((df.vortex_type != 0).sum() / df.shape[0] > 0.2):
            flag = False
    if flag:
        df['cat'] = 99
    return df

with concurrent.futures.ProcessPoolExecutor(4) as pool:
    TR.data = pd.concat(list(pool.map(foo, [j for i, j in TR.gb], chunksize=10)))

So far it doesn't give any speed-ups, but maybe it will for heavier computations.

dennissergeev commented 6 years ago

For track density calculation, concurrent execution reduces total time by 2. Example:

with concurrent.futures.ProcessPoolExecutor(4) as pool:
    res = list(pool.map(density, gb_list, chunksize=10))
dens = np.array(res).sum(axis=0)

Needs to be investigated further.