Open dennissergeev opened 6 years ago
Proof of concept:
def foo(df):
flag = True
if df.total_dist_km < 300.:
flag = False
if flag:
if ((df.vortex_type != 0).sum() / df.shape[0] > 0.2):
flag = False
if flag:
df['cat'] = 99
return df
with concurrent.futures.ProcessPoolExecutor(4) as pool:
TR.data = pd.concat(list(pool.map(foo, [j for i, j in TR.gb], chunksize=10)))
So far it doesn't give any speed-ups, but maybe it will for heavier computations.
For track density calculation, concurrent execution reduces total time by 2. Example:
with concurrent.futures.ProcessPoolExecutor(4) as pool:
res = list(pool.map(density, gb_list, chunksize=10))
dens = np.array(res).sum(axis=0)
Needs to be investigated further.
Add an option to execute functions in parallel processes