Describe the bug
I started training a small 100K images of 256x256 resolution on a Tesla T4 on Colab, after a 5 hours of training I decided to resume on Kaggle with a Tesla P100, I used the same .pkl file where the training on Colab ended but the generated fakes are the same blurry image generated when the training first started on Colab. I used the same exact dataset for both trainings.
I am using the following command to train:
!python stylegan3/train.py --outdir "/content/runs" --cfg stylegan3-t --data "/content/anime-faces.zip" --batch-gpu 16 --gpus 1 --batch 32 --snap 5 --gamma 2 --metrics none --resume "/content/network.pkl"
To Reproduce
Steps to reproduce the behavior:
Start training on Colab (Tesla T4).
Resume training on Kaggle (Tesla P100).
Fakes generated where it left of on Colab:
Fakes generated after resuming on a different GPU on Kaggle:
Expected behavior
The generated fakes on the different GPU should be the same as where I left of at Colab.
Desktop (please complete the following information):
OS: Ubuntu 20.04
PyTorch 1.9.0+cu111
CUDA 11.1
NVIDIA driver version
Kaggle: 470.82.01
Colab: 460.32.03
GPU
Kaggle: Tesla P100
Colab: Tesla T4
Docker: no
Additional context
I only resumed via the .pkl file generated by Colab and nothing else.
Never mind I was able to fix it by copying the entire resume folder from Colab to Kaggle which contains the .json, fakes and log files and start training the .pkl file inside it.
Describe the bug I started training a small 100K images of 256x256 resolution on a Tesla T4 on Colab, after a 5 hours of training I decided to resume on Kaggle with a Tesla P100, I used the same .pkl file where the training on Colab ended but the generated fakes are the same blurry image generated when the training first started on Colab. I used the same exact dataset for both trainings.
I am using the following command to train:
!python stylegan3/train.py --outdir "/content/runs" --cfg stylegan3-t --data "/content/anime-faces.zip" --batch-gpu 16 --gpus 1 --batch 32 --snap 5 --gamma 2 --metrics none --resume "/content/network.pkl"
To Reproduce Steps to reproduce the behavior:
Fakes generated where it left of on Colab:
Fakes generated after resuming on a different GPU on Kaggle:
Expected behavior The generated fakes on the different GPU should be the same as where I left of at Colab.
Desktop (please complete the following information):
Additional context I only resumed via the .pkl file generated by Colab and nothing else.