Ran into missing implementation issue with resnet34 training. The same code runs on CPU and CUDA without a problem.
torch version: 2.1.0.dev20230321+cpu
torchvision version: 0.16.0.dev20230321+cpu
Also noticed that aten::_copy_from_and_resize is missing, but I suppose a separate issue is required.
update: to be fair the functions isn't used by the resnet34 training itself, but by torch.max function called alongside. Having all auxiliary operations moved to CPU paved the way to successful training, though the GPU utilization seemed a bit low being around 30%.
Ran into missing implementation issue with
resnet34
training. The same code runs on CPU and CUDA without a problem.Also noticed that
aten::_copy_from_and_resize
is missing, but I suppose a separate issue is required.update: to be fair the functions isn't used by the resnet34 training itself, but by
torch.max
function called alongside. Having all auxiliary operations moved to CPU paved the way to successful training, though the GPU utilization seemed a bit low being around 30%.