Closed xxxiaol closed 3 years ago
Could you try using a smaller batch size? and what size is the gpu memory in your system?
Thanks for your reply! But small batch sizes like 16 don't work either. This may be due to different versions of dgl and cuda, as the code runs smoothly in another environment with cuda 11.2.
Thanks for your reply! But small batch sizes like 16 don't work either. This may be due to different versions of dgl and cuda, as the code runs smoothly in another environment with cuda 11.2.
Hello! Are you remember which dgl and torch version you used with cuda11.2. When I ran with cuda11.1 and torch1.6-1.8 on Tesla A100, there were some code going error( in pretrain and train). Have you ever fixed some code when you ran with cuda11.* Thank you a lot~
Hi! No I haven't tested with Cuda11. Please run the code with Cuda10.1! This one was tested.
Hello Woojeong,
When I run pretrain.py as described in instruction:
It gets OOM error in the first batch:
I wonder why the update step needs so much memory. Could you please help me? Thanks a lot! By the way, my DGL version is dgl-cu102 (don't know whether this difference causes the error).