Closed gautambak closed 4 years ago
can you try setting the context of the imputer explicitly before calling imputer predict, for instance like:
import mxnet as mx
gpu_context = mx.gpu([device_id])
imputer.ctx = gpu_context
imputed = imputer.predict(df_test)
It doesn't seem to work..it seems when I predict, it maxes one python process to 100% for most of the time and then for about 5% of the time the GPU is used. I'm eying the resource usage to see this.
I'm using a DF of 600k,200 if that makes a difference?
one thing that could cause this is that the preprocessing is not done on the GPU but only on CPU.
In case your data frame contains text data, then for a large dataframe the preprocessing can take pretty long, followed by a short GPU load, if GPUs are available.
We had other implementations with parallelized and distributed preprocessing on spark/dask, but there just weren't enough use cases to support that functionality further.
I'm closing this for now as it seems this is the expected behaviour - but we'll keep that in mind in case we'll do a major refactoring of the preprocessing units.
And of course, this is open source software, if you feel like building a dask (or any other parallelized/distributed) preprocessing module, we'd be thrilled to merge your pull request.
That helps and makes sense, Thank you Felix. I'll experiment with removing the text categorical data from the DF.
I def want to contribute. I'm just not the greatest programmer. I will keep it in mind as a side project to learn. Thanks again for all your help - this tool is really great.
Hi There,
I could be wrong but it appears that the GPU is mainly used when training. When I train my model, I see the GPU speed up but when I'm doing predictions it uses a single CPU core. For my large dataset, I'm noticing it's spending more time here than the training.
Is there is anything I can do to leverage the GPU for predictions as well? In the tutorial, you can recreate this with using a large dataset(1M rows X 200 columns) and run imputed = imputer.predict(df_test)