LN 3 joblib does not run properly on Windows

orlandocosta76 commented 6 years ago

When trying to run cell 67:

res = Parallel(n_jobs=-1)(delayed(wrap_predict_with_sarimax)(df_=new_train, store_nr=store_nr) 
                          for store_nr in tqdm_notebook(stores))

It starts by printing a small progress bar, as you can see below, but then it stalls there, it never ends, not sure about the reason, any idea on the root cause?

Here is the result of conda env export: ldsa.txt

PS-As a side note, cell 63 runs without this lib in about 37.5 secs.

for store_nr in tqdm_notebook(stores):
    res[store_nr] = predict_with_sarimax(df_=new_train, store_nr=store_nr, n_steps=4)

orlandocosta76 commented 6 years ago

In the meantime, I've created a notebook with the the function wrap_predict_with_sarimax inside utils.py and this code runs using the suggestion for Windows here:

if __name__ == '__main__':
    train, test = load_data()
    stores = train.index.get_level_values('Store').unique()

    res = Parallel(n_jobs=-1)(delayed(wrap_predict_with_sarimax)(df_=train, store_nr=store_nr) 
                          for store_nr in tqdm_notebook(stores))
    res = dict(res)

This takes 1m and 6 secs, where the non-joblib version takes 37secs....it seems it is not using more than 1 CPU and the extra time maybe due to overhead (I've tried joblib with jobs=1, 2 and 3 and the result is around 1 min still).

But the strange thing is that if I remove the line: if __name__ == '__main__': the code still runs...maybe it has problems running inside the notebook because of previous code and/or because not all code in LN3 that is to run with joblib is defined inside a function definition.

PS-This was run with a Intel i7-7600u (2 Cores with 4 threads) and the load_data function was insided the notebook where joblib is being invoked.

def load_data():
    idx = pd.IndexSlice

    train = pd.read_csv('../data/train.csv')
    test = pd.read_csv('../data/test.csv')

    train.Date = pd.to_datetime(train.Date, format='%Y-%m-%d')
    train = train.set_index(['Date', 'Store'])
    train = train.sort_index()

    last_day_in_index = train.index.get_level_values('Date').max()
    new_last_day_train = last_day_in_index - pd.DateOffset(days=4)
    new_first_day_test = last_day_in_index - pd.DateOffset(days=3)

    new_train = train.loc[idx[:new_last_day_train, :], :]
    new_test  = train.loc[idx[new_first_day_test:, :], :]

    return new_train, new_test

orlandocosta76 commented 6 years ago

Later, I tried to run the same notebook on another laptop with i7-6700 (4 cores, 8 thread) and the results are more consistent, 22 secs with n_jobs=-1 (meaning all CPUS) and 40 secs single threaded. I guess the previous results might have been because I was running on a machine with only 2 core.

Still, it is strange why it did not run joblib within learning notebook 3, but it may be related with that "main" safeguard I described earlier, or because a lot of code outside a function is run until that point. I guess a rule of thumb to use joblib on Windows is to try to wrap all the code inside a function as mentioned in the documentation.

PS-Attached a file that contains this code based on LN3, just to test joblib: joblib_test.zip

orlandocosta76 commented 6 years ago

Since I kind of solved this myself, I will close the issue

LDSSA / batch2-BLU03

LN 3 joblib does not run properly on Windows #5