automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.63k stars 1.28k forks source link

Running predict() on parallel. #1091

Open andersonpiata opened 3 years ago

andersonpiata commented 3 years ago

Hello.

I've trained a model with autosklearn and now I want to deploy it with a Flask API. I've serialized the model with joblib and each request to the Flask API runs predict() on about 10 to 20 rows of a Pandas dataframe.

A single request for the API usually runs in about 80ms but if I try to run 10 requests simultaneously each one takes about 1400ms, more than the 800ms it takes to run those requests one after the another.

Can someone offer some insight on why running predict() in parallel has such a bad performance?

P.S.: The use of CPU seems to be the same while running the requests in series and in parallel and I've already tried using different objects on each predict with no success.

P.S.: I'm using version 0.8.0 of autosklearn. Mostly because more recent versions don't include de regressor ridge_regression, which works best for my training.

Ubuntu 20.04.2 No virtual environment Python 3.8.5 Auto-sklearn version 0.8.0

mfeurer commented 3 years ago

Hi @andersonpiata thanks for you interest in auto-sklearn. I guess you're observing quite some overhead here. What's happening if you go parallel is that Auto-sklearn runs the different models in the final ensemble in parallel. This is most likely not faster for 10 or 20 rows. Also, depending on the models you have and your environment variables, scikit-learn can use multiprocessing itself, see #1009.

andersonpiata commented 3 years ago

Hello @mfeurer , thank you for your reply.

I set those variables to 1 but the odd behavior continues. It takes much longer to run requests in parallel than to run them in series. The only difference is that I no longer see a CPU spike while running the requests in parallel.

export MKL_NUM_THREADS=1 export OPENBLAS_NUM_THREADS=1 export OMP_NUM_THREADS=1

mfeurer commented 3 years ago

What's the value for n_jobs you're using? Is there any difference based on the value?

andersonpiata commented 3 years ago

I've been using the default value of 1for the n_jobs parameter on predict(). If I set it to 2 or 4 my single request time actually increases, from 20ms to 200ms, and two or more simultaneous requests fail with a traceback that I can send here if it's relevant.

That happens independently of the values of the system variables mentioned above.

andersonpiata commented 3 years ago

I did some further tests with larger samples and now I can tell that set those system variables to 1 actually helps but yet the time in parallel is still worse than the time in series.

This is the average time I measured to finish 100 simultaneous requests, where parallel means Flask is set to answer the requests in parallel and series means it's answering them in series:


parallel, variables unset:  4757 ms
parallel, variables set to 1:   3389 ms
series, variables unset:    2464 ms
series, variables set to 1: 1779 ms```
mfeurer commented 3 years ago

I think I now get the question - the issue is that running FLASK in parallel slows down the prediction, not using Auto-sklearn's n_jobs when running predict. Is that correct?

andersonpiata commented 3 years ago

I don't think it's Flask that is causing the longer times to run the predictions in parallel. One reason is that predictions with trainings that didn't converged very well (r2 around 0.6) take the same amount of time to run in series and in parallel. The differences on the times I listed above happen with better convergences (r2 around 0.8).