Doing sequence of predict-train or train-predict methods fails while using deepaas with Tensorflow on GPU. Executing predict-predict and/or train-train works.
Steps to Reproduce
If after deploying the container I start only predict, it works and I can repeat it.
If I start only training after deployment, it works. I can also repeat it.
However, if I first start predict and then train or vice versa, it fails.
Expected behavior:
That whatever order of executed functions, they work fine.
Actual behavior
predict-train or train-predict fails, could be Tensorflow specific. The reason seems to be that predict and train are two different processes in Linux. First started process occupies GPU and the second one simply has not enough GPU memory to perform the task.
Versions
DEEPaaS 1.0.1 and 1.2.0
Tensorflow 1.12.0 and 1.14.0
Nvidia driver 418.56 on one site and 440.33.01 on another
Description
Doing sequence of predict-train or train-predict methods fails while using deepaas with Tensorflow on GPU. Executing predict-predict and/or train-train works.
Steps to Reproduce
Expected behavior:
That whatever order of executed functions, they work fine.
Actual behavior
predict-train or train-predict fails, could be Tensorflow specific. The reason seems to be that predict and train are two different processes in Linux. First started process occupies GPU and the second one simply has not enough GPU memory to perform the task.
Versions
DEEPaaS 1.0.1 and 1.2.0 Tensorflow 1.12.0 and 1.14.0 Nvidia driver 418.56 on one site and 440.33.01 on another