Make predictions in parallel ?

gmaze commented 2 years ago

Is there any counter-indications that we could make predictions in parallel ? using joblib for instance.

Using notebooks implementation in Prediction.ipynb I ended up with a trivial formulation like:

for model in ensemble:
  tmp_pred = model.predict(X)

where model is one instance returned by keras.models.load_model

This is implemented here: https://github.com/euroargodev/OSnet-GulfStream/blob/df8486789fb86b3c5746b004ba3573b769350636/osnet/facade.py#L92

@loicbachelot @quai20 , any thoughts on this may be ?

loicbachelot commented 2 years ago

Actually, we could try but the main issue I have here is the prediction itself for one model is already done in parallel (handled by Tensorflow). So adding this higher level of parallelization can cause more conflict in the computer resources access. What would be the best is to have dedicated resources for each model (having a couple of core for each) but the resources needed explode). I had a look at this mainly for the training because it is the slowest part and I think with dask it would be possible to create a new PBS job for each training and merge the results after but I didn't go that far in the automatic scaling with dask. For the prediction, it seems so fast that by the time the job is created everything would be finished already.

gmaze commented 2 years ago

So adding this higher level of parallelization can cause more conflict

Indeed, may be tensorflow could work with multithreading and the higher level of models on multiprocesses

euroargodev / OSnet-GulfStream

Make predictions in parallel ? #6