MLBazaar / MLPrimitives

Primitives for machine learning and data science.
https://mlbazaar.github.io/MLPrimitives
MIT License
69 stars 38 forks source link

Inability to scale Keras primitives #258

Closed sarahmish closed 3 years ago

sarahmish commented 3 years ago

Description

After upgrading MLPrimitives to version 0.3, Orion benchmarking pipelines fail when ran in parallelization (whether using dask or multiprocessing). In this latest release, tensorflow dramatically changed the underlying computation and composition of models which I believe is the reason for this breakage.

For reference, I am using the lstm_dynamic_threshold pipeline in Orion which uses the keras.Sequential.LSTMTimeSeriesRegressor. The output first generates warnings which indicate there is excessive computation happening in the adapter, I presume some tweaks need to be made for this to work properly

WARNING:tensorflow:5 out of the last 12 calls to <function 
Model.make_test_function.<locals>.test_function at 0x7f218041c4d0> 
triggered tf.function retracing. 
Tracing is expensive and the excessive number of tracings could be due to 
(1) creating @tf.function repeatedly in a loop, 
(2) passing tensors with different shapes, 
(3) passing Python objects instead of tensors. 
For (1), please define your @tf.function outside of the loop. 
For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. 
For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

After 21 pipeline runs, the code fails with the dreaded Segmentation fault (core dumped).

Note This problem only occurs when attempting to parallel execute pipelines. When benchmarking serially, there is no issue

sarahmish commented 3 years ago

After investigating the settings of the system, the cause of this is a memory issue. You can use multiprocessing to release the memory after finishing computation and it will work fine.

Closing.