apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.19k stars 1.14k forks source link

Style transfer being killed during training #3128

Open JoeStrout opened 4 years ago

JoeStrout commented 4 years ago

turicreate is very consistently dying for us while training a style transfer. It always happens between 3000 and 4000 iterations. There is no helpful output; just:

| 3781         | 2.72202      | 35m 8s       |
zsh: killed     python3 test-john.py

There is no crash log found in /Library/Logs/DiagnosticReports, nor anything dumped to /cores (even after doing a ulimit -c unlimited).

The script we're using is attached. Various tweaks to this script (e.g. not using pretrained weights) all produce the same sad result. This is training on macOS 10.15.4 (build 19E287), and using an AMD Radeon RX Vega 64 external GPU.

test-john.py.zip

tinrocket commented 4 years ago

Joe's teammate, John, here— We have been using the GPU for the style transfer training. During training, process memory for python has climbed to nearly 80GB before the process is terminated.

Running the same training on the CPU uses less process memory for Python: under 2 GB.

The process has not gotten killed when training on the CPU, but I have not tested the CPU extensively.

rainhut commented 4 years ago

Yeah, same old problems with turicreate and GPU. Apple must not be supporting this. Go to Google Colab and use tensorflow.

iwasnothing commented 4 years ago

I have the same problem. My py script will be killed after 8000 iterations. I found the virtual size of the process reached over 60G. not sure if there is any memory leak