NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.93k stars 2.48k forks source link

The SDXL Infer output image is full of noise #10938

Open blacklong28 opened 5 days ago

blacklong28 commented 5 days ago

Describe the bug

I follow the tutorial convert a model download from https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/unet/diffusion_pytorch_model.safetensors. and I only convert from pytorch model to nemo model. and infer the nemo model. but the output image is full of noise. Is there a bug or am I doing something wrong?

Steps/Code to reproduce bug

I just follow the tutorial . No quantization model is needed, the nemo model is converted, and the sd_xl_infer.py script is directly used for inference The same result can be deduced from quantified model

  1. download model:

    mkdir -p /sdxl_ckpts/stable-diffusion-xl-base-1.0/unet && wget -P /sdxl_ckpts/stable-diffusion-xl-base-1.0/unet https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/unet/diffusion_pytorch_model.safetensors
    mkdir -p /sdxl_ckpts/stable-diffusion-xl-base-1.0/vae && wget -P /sdxl_ckpts/stable-diffusion-xl-base-1.0/vae https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/vae/diffusion_pytorch_model.safetensors
  2. convert safetensors to nemo model

    python3 /opt/NeMo/examples/multimodal/text_to_image/convert_hf_ckpt_to_nemo.py \
    --model_type sdxl \
    --ckpt_path /sdxl_ckpts/stable-diffusion-xl-base-1.0/unet/diffusion_pytorch_model.safetensors \
    --hparams_file /opt/NeMo/examples/multimodal/text_to_image/stable_diffusion/conf/sd_xl_base_train.yaml \
    --nemo_file_path $WORKDIR/sdxl_base.nemo
  3. infer nemo model

    python3 /opt/NeMo/examples/multimodal/text_to_image/stable_diffusion/sd_xl_infer.py model.restore_from_path=/sdxl_base.nemo out_path=/sdxl_infer_out

Expected behavior

Expect to produce a normal image instead of all noise

Environment details

Additional context The output image: Image

blacklong28 commented 2 days ago

@Victor49152 Hi, Can you help me? I just used the config provided in the NeMo/examples/multimodal/text_to_image/stable_diffusion/conf and follow the tutorials. I also use the nemo docker to run sd_xl_infer.py,and is still noise output image.

Victor49152 commented 2 days ago

Thanks for your post. Could you please check the log executing 'convert_hf_ckpt_to_nemo.py', I think you might see some unexpected keys and missing keys warning.

Some layer names might be changed in NeMo so the conversion script is not mapping the keys properly. Please let me know if that is the case, I will try to update the conversion script. Thanks.

blacklong28 commented 2 days ago

Thanks for your reply. This is the log executing 'convert_hf_ckpt_to_nemo.py': convert_nemo_test.log I also saw some Missing and Unexpected keys in the SDXL Quantization.ipynb you provided. I thought they were normal, so I didn't pay much attention to them. Please help me to see if they are normal. Thank you.

Victor49152 commented 1 day ago

This conversion script is obsolete. Can you try https://github.com/NVIDIA/NeMo/blob/main/scripts/checkpoint_converters/convert_stablediffusion_hf_to_nemo.py and https://github.com/NVIDIA/NeMo/blob/409f1d847ff53a66e56763da3a83e2980e9afe53/examples/multimodal/text_to_image/stable_diffusion/conf/sd_xl_infer_v2.yaml as the new inference config.

Let me know if these work for you. I will update the notebook later.

blacklong28 commented 1 day ago

I use this script(https://github.com/NVIDIA/NeMo/blob/main/scripts/checkpoint_converters/convert_stablediffusion_hf_to_nemo.py)to convert a safetensors model to nemo.ckpt I notice that the saved model uses torch.save to save a ckpt model, is not a .nemo model. and use this config.

        model_cfg.unet_config.from_pretrained = "/opt/NeMo/nemo_out/sdxl_base_new_test1023A_nemo.ckpt"
        model_cfg.unet_config.from_NeMo = True
        model_cfg.first_stage_config.from_pretrained = "/opt/NeMo/nemo_out/sdxl_vae_new_test1023A_nemo.ckpt"
        model_cfg.first_stage_config.from_NeMo = True
python3 /opt/NeMo/examples/multimodal/text_to_image/stable_diffusion/sd_xl_infer.py model.restore_from_path=/opt/NeMo/nemo_out/sdxl_base_new_test1023A_nemo.ckpt  out_path=/opt/NeMo/infer_out

I got the error:

root@d392d2e1fa20:~# python3 /opt/NeMo/examples/multimodal/text_to_image/stable_diffusion/sd_xl_infer.py model.restore_from_path=/opt/NeMo/nemo_out/sdxl_base_new_test1023A_nemo.ckpt  out_path=/opt/NeMo/infer_out
[NeMo I 2024-10-23 03:19:48 utils:285] FSDP is False, using DDP strategy.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[NeMo W 2024-10-23 03:19:48 utils:333] Loading from .ckpt checkpoint for inference is experimental! It doesn't support models with model parallelism!
Error executing job with overrides: ['model.restore_from_path=/opt/NeMo/nemo_out/sdxl_base_new_test1023A_nemo.ckpt', 'out_path=/opt/NeMo/infer_out']
Traceback (most recent call last):
  File "/opt/NeMo/examples/multimodal/text_to_image/stable_diffusion/sd_xl_infer.py", line 37, in main
    trainer, megatron_diffusion_model = setup_trainer_and_model_for_inference(
  File "/opt/NeMo/nemo/collections/multimodal/parts/utils.py", line 337, in setup_trainer_and_model_for_inference
    model = model_provider.load_from_checkpoint(
  File "/opt/NeMo/nemo/collections/nlp/models/nlp_model.py", line 385, in load_from_checkpoint
    model = ptl_load_state(cls, checkpoint, strict=strict, cfg=cfg, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/saving.py", line 158, in _load_state
    obj = cls(**_cls_kwargs)
  File "/opt/NeMo/nemo/collections/multimodal/models/text_to_image/stable_diffusion/diffusion_engine.py", line 367, in __init__
    super().__init__(cfg, trainer=trainer)
  File "/opt/NeMo/nemo/collections/nlp/parts/mixins/nlp_adapter_mixins.py", line 88, in __init__
    super().__init__(*args, **kwargs)
  File "/opt/NeMo/nemo/collections/nlp/models/language_modeling/megatron_base_model.py", line 118, in __init__
    with open_dict(cfg):
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
AttributeError: 'dict' object has no attribute '_get_node_flag'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Are there any other parameters or code I need to change here ?

If the.nemo suffix model is used as the file name for saving the model, an error will be reported when loading sdxl_infer.

root@d392d2e1fa20:~# python3 /opt/NeMo/scripts/checkpoint_converters/convert_stablediffusion_hf_to_nemo.py --input_name_or_path /sdxl_ckpts/stable-diffusion-xl-base-1.0/unet/ --output_path /opt/NeMo/nemo_out/sdxl_base_new_test1023A.nemo --model unet --debug
[NeMo I 2024-10-23 02:22:15 convert_stablediffusion_hf_to_nemo:413] loading checkpoint /sdxl_ckpts/stable-diffusion-xl-base-1.0/unet/
[NeMo I 2024-10-23 02:22:15 convert_stablediffusion_hf_to_nemo:418] converting unet...
[NeMo I 2024-10-23 02:22:15 convert_stablediffusion_hf_to_nemo:268] Add embedding found...
[NeMo I 2024-10-23 02:22:15 convert_stablediffusion_hf_to_nemo:273] Time embedding found...
[NeMo I 2024-10-23 02:23:16 convert_stablediffusion_hf_to_nemo:447] Saved nemo file to /opt/NeMo/nemo_out/sdxl_base_new_test1023A.nemo
root@d392d2e1fa20:~# python3 /opt/NeMo/examples/multimodal/text_to_image/stable_diffusion/sd_xl_infer.py model.restore_from_path=/opt/NeMo/nemo_out/sdxl_base_new_test1023A.nemo  out_path=/opt/NeMo/infer_out
[NeMo I 2024-10-23 02:24:52 utils:285] FSDP is False, using DDP strategy.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Error executing job with overrides: ['model.restore_from_path=/opt/NeMo/nemo_out/sdxl_base_new_test1023A.nemo', 'out_path=/opt/NeMo/infer_out']
Traceback (most recent call last):
  File "/usr/lib/python3.10/tarfile.py", line 1870, in gzopen
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.10/tarfile.py", line 1847, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/usr/lib/python3.10/tarfile.py", line 1707, in __init__
    self.firstmember = self.next()
  File "/usr/lib/python3.10/tarfile.py", line 2622, in next
    raise e
  File "/usr/lib/python3.10/tarfile.py", line 2595, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/usr/lib/python3.10/tarfile.py", line 1285, in fromtarfile
    buf = tarfile.fileobj.read(BLOCKSIZE)
  File "/usr/lib/python3.10/gzip.py", line 301, in read
    return self._buffer.read(size)
  File "/usr/lib/python3.10/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.10/gzip.py", line 488, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.10/gzip.py", line 436, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'PK')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/NeMo/examples/multimodal/text_to_image/stable_diffusion/sd_xl_infer.py", line 37, in main
    trainer, megatron_diffusion_model = setup_trainer_and_model_for_inference(
  File "/opt/NeMo/nemo/collections/multimodal/parts/utils.py", line 314, in setup_trainer_and_model_for_inference
    model_cfg = model_provider.restore_from(
  File "/opt/NeMo/nemo/collections/nlp/models/nlp_model.py", line 478, in restore_from
    return super().restore_from(
  File "/opt/NeMo/nemo/core/classes/modelPT.py", line 468, in restore_from
    instance = cls._save_restore_connector.restore_from(
  File "/opt/NeMo/nemo/collections/nlp/parts/nlp_overrides.py", line 1298, in restore_from
    loaded_params = super().load_config_and_state_dict(
  File "/opt/NeMo/nemo/core/connectors/save_restore_connector.py", line 148, in load_config_and_state_dict
    members = self._filtered_tar_info(restore_path, filter_fn=filter_fn)
  File "/opt/NeMo/nemo/core/connectors/save_restore_connector.py", line 622, in _filtered_tar_info
    with SaveRestoreConnector._tar_open(tar_path) as tar:
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/opt/NeMo/nemo/core/connectors/save_restore_connector.py", line 661, in _tar_open
    tar = tarfile.open(path2file, tar_header)
  File "/usr/lib/python3.10/tarfile.py", line 1817, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/usr/lib/python3.10/tarfile.py", line 1874, in gzopen
    raise ReadError("not a gzip file") from e
tarfile.ReadError: not a gzip file

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.