When I tried to reproduce the fine-tuning result of MAE-ViT-L on the 8-GPU machine, I faced this problem.
File "/home/ywlee/SlowFast/slowfast/models/video_model_builder.py", line 1215, in forward
) + torch.repeat_interleave(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Hi,
Thanks for your excellent work.
When I tried to reproduce the fine-tuning result of MAE-ViT-L on the 8-GPU machine, I faced this problem.
My command is :
I changed the only
DATA.DECODING_BACKEND
withpyav
becausetorchvision
results in an error.My environment:
python==3.8 torch==1.12.0 torchvision==0.11.1
From this environment, I succeeded to train k400_VIT_B_16x4_MAE_PT.yaml. But, when I tried to fine-tune k400_VIT_B_16x4_FT.yaml, the above same error occurred.