Closed orlandocosta76 closed 6 years ago
In the meantime, I've created a notebook with the the function wrap_predict_with_sarimax
inside utils.py and this code runs using the suggestion for Windows here:
if __name__ == '__main__':
train, test = load_data()
stores = train.index.get_level_values('Store').unique()
res = Parallel(n_jobs=-1)(delayed(wrap_predict_with_sarimax)(df_=train, store_nr=store_nr)
for store_nr in tqdm_notebook(stores))
res = dict(res)
This takes 1m and 6 secs, where the non-joblib version takes 37secs....it seems it is not using more than 1 CPU and the extra time maybe due to overhead (I've tried joblib with jobs=1, 2 and 3 and the result is around 1 min still).
But the strange thing is that if I remove the line:
if __name__ == '__main__':
the code still runs...maybe it has problems running inside the notebook because of previous code and/or because not all code in LN3 that is to run with joblib is defined inside a function definition.
PS-This was run with a Intel i7-7600u (2 Cores with 4 threads) and the load_data function was insided the notebook where joblib is being invoked.
def load_data():
idx = pd.IndexSlice
train = pd.read_csv('../data/train.csv')
test = pd.read_csv('../data/test.csv')
train.Date = pd.to_datetime(train.Date, format='%Y-%m-%d')
train = train.set_index(['Date', 'Store'])
train = train.sort_index()
last_day_in_index = train.index.get_level_values('Date').max()
new_last_day_train = last_day_in_index - pd.DateOffset(days=4)
new_first_day_test = last_day_in_index - pd.DateOffset(days=3)
new_train = train.loc[idx[:new_last_day_train, :], :]
new_test = train.loc[idx[new_first_day_test:, :], :]
return new_train, new_test
Later, I tried to run the same notebook on another laptop with i7-6700 (4 cores, 8 thread) and the results are more consistent, 22 secs with n_jobs=-1 (meaning all CPUS) and 40 secs single threaded. I guess the previous results might have been because I was running on a machine with only 2 core.
Still, it is strange why it did not run joblib within learning notebook 3, but it may be related with that "main" safeguard I described earlier, or because a lot of code outside a function is run until that point. I guess a rule of thumb to use joblib on Windows is to try to wrap all the code inside a function as mentioned in the documentation.
PS-Attached a file that contains this code based on LN3, just to test joblib: joblib_test.zip
Since I kind of solved this myself, I will close the issue
When trying to run cell 67:
It starts by printing a small progress bar, as you can see below, but then it stalls there, it never ends, not sure about the reason, any idea on the root cause?
Here is the result of
conda env export
: ldsa.txtPS-As a side note, cell 63 runs without this lib in about 37.5 secs.