ShivamShrirao / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
https://huggingface.co/docs/diffusers
Apache License 2.0
1.89k stars 505 forks source link

New alerts and affected performance #203

Open shadowlocked opened 1 year ago

shadowlocked commented 1 year ago

Describe the bug

New alerts appear after training, that were not there even a few days ago. These are the alerts:

2023-02-11 07:34:06.889857: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-11 07:34:07.910368: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-11 07:34:07.910479: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-11 07:34:07.910500: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

The problems I have had today are not necessarily related, but I found that a training session stopped 8k steps ahead of schedule without error, and that even after reloading all the elements in a new instance of the DreamBooth Colab, it was not possible to use the convert to checkpoint script: the script seemed to run, but literally within two seconds had finished, reporting where the model was saved, and it was not saved. At this point, every single necessary library had been downloaded.

In short, I am having a lot of trouble getting the script to work today, and I assume the new errors are the reason.

Reproduction

This does not apply - try it yourself and see if you get the same errors I did.

Logs

No response

System Info

Colab, system info not relevant.