Closed nikky4D closed 3 years ago
I came here to ask the exact same thing.
I can run FastAI locally just fine with num_workers = 0 on Windows, but that is painfully slow. The GPU load keeps fluctuating and the training times are 3~4x longer than what they should.
Is there any tutorial/guide/best practices to run FAST AI on windows? I am not saying about installation, cuda or anything like that. My only issue is with the dataloader.
On a side note, I managed to run almost the same pipeline using Pytorch Lightning by wrapping my call inside
if __name__ == '__main__':
main_train_loop()
but this didn’t work with fastai for me. I keep getting pickling errors relating to my augmentation functions (not using any lambda func)
Thanks in advance, any tip would be helpful.
A tutorial / sample would be good. @coldfir3, can you elaborate on what you did with pytorch lightning?
Of course. This is the code I used to train https://colab.research.google.com/drive/1gJ0sT5wCBbJRLRU9htfyKSIW73y4jtEn?usp=sharing For some reason it won't work using jupyter so I had to save it as .py and run with 'python code.py'
Thanks for sharing. This is really helpful for me.
@muellerzr Would you have any advice for fastai on windows speedup? The sample code does not work for me.
Sadly I have not, I could not make fastai to work fast (sorry for the joke) on windows. In the end I installed Ubuntu and whenever I need to run something locally I just change OS... It is a pain, but better than use 5% of my GPU when training with a single worker. :)
Sadly I have not, I could not make fastai to work fast (sorry for the joke) on windows. In the end I installed Ubuntu and whenever I need to run something locally I just change OS... It is a pain, but better than use 5% of my GPU when training with a single worker. :)
I am going that route as well. Thanks for the links.
Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES
Describe the bug in my image dataloader, using num_workers > 0 (here num_workers = 2), and using multiprocessing.set_start_method('spawn') following the windows script example, I get the error "THCudaCheck". When I set num_workers = 0, the model builds and trains This occurs only when fit_one_cycle() is called.
Error with full stack trace