Open Youskrpig opened 1 year ago
I also try model = torchvision.models.resnet50(pretrained=True), in forward function, it outputs normal
This looks a bit strange, as we have not encountered this issue before. Would you be able to provide more information? This would be helpful for us to pinpoint the issue. Thank you!
意思就是训练的前向传播过程一定会占用0号卡显存(我是2张卡,即使指定输入和模型都放在1号卡上(weight和bias我确认过了都在1号卡),但是还是会占用0号卡的一点显存,如果0号卡显存是占满的,那么就会报错),所以想问下是不是cuda编译的时候有使用0号卡的地方?
直接指定CUDA_VISIBLE_DEVICES=1呢
I try to ensure if model could run and gpu1 is empty,but in forward funcation, error occurs: