Closed AhmadObeid closed 7 months ago
After searching and trying many things, the solution turned out to be simple. Here it is in case anyone gets stuck. All you have to do is enclose the traner.fit() calling in main.py Line 759 with the autocast() method:
with torch.cuda.amp.autocast():
trainer.fit(model, data)
You could also add precision: 16
to the last line of the training config (additional argument to the pytorch lightning Trainer constructor)
https://pytorch-lightning.readthedocs.io/en/1.2.10/advanced/amp.html#gpu-16-bit
You could also add
precision: 16
to the last line of the training config (additional argument to the pytorch lightning Trainer constructor)https://pytorch-lightning.readthedocs.io/en/1.2.10/advanced/amp.html#gpu-16-bit
adding this parameter raises another error:
RuntimeError: expected scalar type Half but found Float
Hi, thank you authors for sharing your code with us. I am facing the following error when attempting to run main.py with the same config file provided:
This is similar to the problem discussed in issue #7, yet it has not been resolved. Things I tried: 1) Made sure requirements are satisfied by creating the conda environment as instructed 2) Tried to convert the model to fp16 using model.half() or model.to(dtype=torch.float16) but I kept getting the error:
To that end, I also tried to make the data dtype the same by placing the line:
image = torch.from_numpy(image).to(dtype=torch.float16)
right before the return in the getitem() of TCGADataset(), but to no avail. 3) I wanted to try usingtorch.cuda.amp.autocast()
, but I discovered that it has been already implemented in ddpm.py line 1136:I really searched everywhere and couldn't find any solution. Your help would be greatly appreciated.