YangLing0818 / SGDiff

Official implementation for "Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training" https://arxiv.org/abs/2211.11138
51 stars 6 forks source link

Testing problems: pretrained model fails to load #6

Open bschroedr opened 10 months ago

bschroedr commented 10 months ago

I am trying to run the inference code (sampler) and the loading of the pretrained models fails. I tried some of the suggestions from another submitted issue, but run into different error with the vq-f8 model:

LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 395.77 M params. Keeping EMAs of 630. making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels Restored from pretrained/vq-f8-model.ckpt with 0 missing and 49 unexpected keys _pickle.UnpicklingError: invalid load key, '<'.

Can you suggest what to do? I am using the code available as of today.

YangLing0818 commented 10 months ago

I am trying to run the inference code (sampler) and the loading of the pretrained models fails. I tried some of the suggestions from another submitted issue, but run into different error with the vq-f8 model:

LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 395.77 M params. Keeping EMAs of 630. making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels Restored from pretrained/vq-f8-model.ckpt with 0 missing and 49 unexpected keys _pickle.UnpicklingError: invalid load key, '<'.

Can you suggest what to do? I am using the code available as of today.

Please train the model first, then you can run the inference code.

YangLing0818 commented 10 months ago

I am trying to run the inference code (sampler) and the loading of the pretrained models fails. I tried some of the suggestions from another submitted issue, but run into different error with the vq-f8 model:

LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 395.77 M params. Keeping EMAs of 630. making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels Restored from pretrained/vq-f8-model.ckpt with 0 missing and 49 unexpected keys _pickle.UnpicklingError: invalid load key, '<'.

Can you suggest what to do? I am using the code available as of today.

vq-f8 is to embed image to latent, and it is not the latent diffusion model. You need to train the latent diffusion model first as introduced in README.MD.

bschroedr commented 10 months ago

The previous error about "with 0 missing and 49 unexpected keys _pickle.UnpicklingError: invalid load key, '<'." was due to a corrupted download of the sip_vg.pt model. Once I downloaded it again, the problem went away, however, I get the following error when training with config_vg.yaml:

Traceback (most recent call last): File "trainer.py", line 382, in model = instantiate_from_config(config.model) AttributeError: 'int' object has no attribute 'strip'

ZerinHwang03 commented 10 months ago

The previous error about "with 0 missing and 49 unexpected keys _pickle.UnpicklingError: invalid load key, '<'." was due to a corrupted download of the sip_vg.pt model. Once I downloaded it again, the problem went away, however, I get the following error when training with config_vg.yaml:

Traceback (most recent call last): File "trainer.py", line 382, in model = instantiate_from_config(config.model) AttributeError: 'int' object has no attribute 'strip'

Please check that the versions of all installed packages in your conda environments are consistent with the sgdiff.yaml provided by us. We just re-train the model on a 3090Ti (CUDA 11.4), and we did not find similar errors.

bschroedr commented 10 months ago

I installed the project exactly as specified so it wasn't a versioning issue. The issue was the following lines of code in trainer.py (518):

if not cpu: ngpu = len(lightning_config.trainer.gpus.strip(",").split(',')) else: ngpu = 1

This code will fail, giving the error I mentioned earlier, if only one GPU is specified since it is not contained in a list. The way to work around this is to use a comma after the GPUs flag:

python trainer.py --base ./config_vg.yaml -t --gpus 1,

That solved the problem for me. The problem is that I can't train the model on my two 12 GB GPUs because I get CUDA out of memory errors:

RuntimeError: CUDA out of memory. Tried to allocate 252.00 MiB (GPU 0; 11.92 GiB total capacity; 10.66 GiB already allocated; 173.12 MiB free; 10.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I've tried reducing the batch size to 4 images and still have this issue. I am not sure what to do - do you have any suggestions?

Could you make a pretrained model available?