Closed elo0i closed 11 months ago
try one of the following images instead of latest.
pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime
pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime
try one of the following images instead of latest.
pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime
pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime
Ohhh my god thank you soo much, now it's working with "pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime" (training started and seems well)
It's strange becouse I can use pytorch:latest if I use this repo the normal way with the dreambooth jupyter notebook but if I use a script based on the notebook with my fork (only modified setup_training.py and download_model.py to NOT work as a form and instead receive the args from my script) i get the out of memory error. Maybe becouse my script consumes more memory and the :1.13.1 consumes less than the :latest ?
Why am I getting put of memory? I made a script.py that is a copy of the "dreambooth_joepenna.ipynb" Notebook and everything goes well until training is about to start, why? I am using it on 3090's instances created in vast.ai with pytorch:latest
I also modified download_model.py and setup_training.py to NOT work as a form and accept args prom my script.py as you can see in the fork I did
Any ideo on how to do this?
This is the ERROR/LOG:
gpu_vram: 23.69 GB { "class_word": "person", "config_date_time": "2023-07-26T18-37-45", "debug": false, "flip_percent": 0.0, "gpu": 0, "learning_rate": 1e-06, "max_training_steps": 4000, "model_path": "sd_v1-5_vae.ckpt", "model_repo_id": "panopstor/EveryDream", "project_config_filename": "2023-07-26T18-37-45-7777777-config.json", "project_name": "7777777", "regularization_images_folder_path": "Stable-Diffusion-Regularization-Images-person_ddim/person_ddim", "save_every_x_steps": 500, "schema": 1, "seed": 23, "token": "TMF", "token_only": false, "training_images": [ "00003.png", "00004.png", "00005.png", "00006.png", "00007.png", "00008.png", "00009.png", "00010.png", "00011.png", "00012.png", "00013.png", "00014.png", "00015.png", "00016.png", "00017.png", "00018.png", "00019.png", "00020.png", "00021.png", "00022.png" ], "training_images_count": 20, "training_images_folder_path": "./training_images" } ✅ 2023-07-26T18-37-45-7777777-config.json successfully generated. Proceed to training. entrenendo?????????????????????????????????????????? Global seed set to 23 gpu_vram: 23.69 GB Loading model from sd_v1-5_vae.ckpt