cvlab-stonybrook / PathLDM

Official Code for PathLDM: Text conditioned Latent Diffusion Model for Histopathology (WACV 2024)
33 stars 3 forks source link

Model dtype, data dtype #16

Closed AhmadObeid closed 7 months ago

AhmadObeid commented 7 months ago

Hi, thank you authors for sharing your code with us. I am facing the following error when attempting to run main.py with the same config file provided:

RuntimeError: Found dtype Float but expected Half

This is similar to the problem discussed in issue #7, yet it has not been resolved. Things I tried: 1) Made sure requirements are satisfied by creating the conda environment as instructed 2) Tried to convert the model to fp16 using model.half() or model.to(dtype=torch.float16) but I kept getting the error:

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

To that end, I also tried to make the data dtype the same by placing the line: image = torch.from_numpy(image).to(dtype=torch.float16) right before the return in the getitem() of TCGADataset(), but to no avail. 3) I wanted to try using torch.cuda.amp.autocast(), but I discovered that it has been already implemented in ddpm.py line 1136:

with torch.cuda.amp.autocast():
                x_recon = self.model(x_noisy, t, **cond)

I really searched everywhere and couldn't find any solution. Your help would be greatly appreciated.

AhmadObeid commented 7 months ago

After searching and trying many things, the solution turned out to be simple. Here it is in case anyone gets stuck. All you have to do is enclose the traner.fit() calling in main.py Line 759 with the autocast() method:

with torch.cuda.amp.autocast():
        trainer.fit(model, data)
srikarym commented 7 months ago

You could also add precision: 16 to the last line of the training config (additional argument to the pytorch lightning Trainer constructor)

https://pytorch-lightning.readthedocs.io/en/1.2.10/advanced/amp.html#gpu-16-bit

windygoo commented 3 weeks ago

You could also add precision: 16 to the last line of the training config (additional argument to the pytorch lightning Trainer constructor)

https://pytorch-lightning.readthedocs.io/en/1.2.10/advanced/amp.html#gpu-16-bit

adding this parameter raises another error:

RuntimeError: expected scalar type Half but found Float