dvschultz / ai

bash script to install Artifical Images materials
152 stars 55 forks source link

StyleGAN2_Colab_Train.ipynb exits abruptly #3

Open alsino opened 4 years ago

alsino commented 4 years ago

Hi there,

first of all, thanks for the great work!

I have an issue when running the StyleGAN2_Colab_Train.ipynb notebook on Colab. After downloading the Stylegan2 repo and converting the images to TF records, I run the training script (run_training.py), but the script exits while compling the TensorFlow plugin "upfirdn_2d.cu" (see image).

Any idea of why this happens be and how to fix this?

error

chrsmlls333 commented 4 years ago

Did this happen multiple times? This is usually the code given by the stop button to end the script early.

alsino commented 4 years ago

Yes, I've tried at least 10 times to run the training script. It just exits without any further error code or notification. 🤷🏽‍♂️

Could it have to do with the notebook running out of memory?

andersonfaaria commented 4 years ago

Yes, I've tried at least 10 times to run the training script. It just exits without any further error code or notification. 🤷🏽‍♂️

Could it have to do with the notebook running out of memory?

I'm facing the same problem, thought it was because I was trying to train on 32x32 files but apparently not.

Also when I tried to delete all the pkl files to train from scratch is gives me index out of range problem in pkl.

molo32 commented 4 years ago

could anyone solve this this? I also have this problem I have noticed that it consumes all the ram memory It must have a form so that it does not consume as much ram, does anyone have an idea?

andersonfaaria commented 4 years ago

could anyone solve this this? I also have this problem I have noticed that it consumes all the ram memory It must have a form so that it does not consume as much ram, does anyone have an idea?

I solved it by not using ffhq database and trainning from start. This way you don't need to load everything to memory and you don't reach the 16gb of memory and run out of collab restrictions. I also made some changes in code to have some features from skyflynil fork while still maining the improvements made by derrick https://github.com/andersonfaaria/stylegan2

molo32 commented 4 years ago

try your repository but it gives me the same error, it tells me which is the colab fixed?

andersonfaaria commented 4 years ago

You get the error because you're trying to load the pretrained network and it explodes the maximum amount of ram the collab supports. Don't use pre trained network, don't use ffhq at all. Train in smaller dimensions using other datasets and you'll be fine

dvschultz commented 4 years ago

@molo32 I’ve never seen this... at this particular step its compiling the custom cuda scripts...maybe make sure youre using CUDA 10.x? (I cant imagine you would be using anything else). Maybe also check to make sure you have a P100? I don’t think you’d run into this with a K80 but its possible.