anishathalye / neural-style

Neural style in TensorFlow! 🎨
https://anishathalye.com/an-ai-that-can-mimic-any-artist/
GNU General Public License v3.0
5.54k stars 1.51k forks source link

cpu shuts down #176

Closed gokhalen closed 3 years ago

gokhalen commented 3 years ago

I'm trying to run neural-style on a Intel i5-11400F 2.6 Ghz processor with 6 physical and 12 logical cores and 16 GB RAM. The OS used was Windows 10 running WSL2 with Ubuntu Linux. After approximately 15 iterations, neural-style causes the CPU to shutdown instantly. This occurs both on TensorFlow 2.4.0 and 2.6. I have trained other neural nets on the same machine using TF 2.4 which occupy all 12 logical cores at 100% without any issues. These neural nets take about 3 hours to train and are using approximately 3.5GB RAM. So I'm guessing that it's not the CPU which is at fault. Should I be looking at throtting the CPU frequency in my BIOS? Has anyone else seen this with neural-style?

anishathalye commented 3 years ago

What exactly do you mean by "CPU to shut down"? Does your OS crash, and so the system shuts down? Can you look in the OS logs and figure out what was the cause of the crash?

It is unlikely that the neural-style code itself is to blame. It's probably an issue with your system, lower in the software stack or with hardware.

One guess is that you're processing a large image, which requires a lot of RAM, and your system doesn't handle OOM gracefully and shuts down. Take a look at these issues and see if any of the advice there helps: https://github.com/anishathalye/neural-style/issues?q=out+of+memory. In particular, one thing you can try is training a really tiny content image (requires less RAM) and see if that works.

gokhalen commented 3 years ago

What I mean by "CPU to shut down" is that the computer shuts down as if someone shut its power down. No warnings/messages from the system, nothing. I will check the logs and see if it is a RAM issue.

gokhalen commented 3 years ago

Also, the error occurs while processing the first example in the examples directory, which I think is small.

anishathalye commented 3 years ago

Try much smaller, e.g. --width 50.

gokhalen commented 3 years ago

--width 50 seems to work.

anishathalye commented 3 years ago

Okay, so 16 GB RAM should be enough to handle the full-size example image. But maybe your WSL2 setup doesn't let the Linux process use the full memory? And it isn't gracefully handling the OOM situation? (A user process shouldn't be able to crash the OS, even by allocating a bunch of memory.)

gokhalen commented 3 years ago

You are right of course. I'll try running outside WSL.

gokhalen commented 3 years ago

I ran into the same problem running neural-style under Anaconda. It seems that the problem was Turbo Boost. Disabling it as given in section 2.2 of this seems to cure the problem. I think the CPU was overheating, but I have no logs to back up this claim.

anishathalye commented 3 years ago

Thanks for the update, glad you got it figured out!