Closed tazlin closed 1 week ago
@CodiumAI-Agent /describe
feat: latest torch/comfyui; perf improvments; fix: SSL cert issues
very_fast_disk_mode
configuration option for concurrent model loading.
false
.very_fast_disk_mode: false
to only load one model at a time when it is being explicitly preloaded. There are some cases where it still might attempt to load more than one but it should be far less often.high_performance_mode
.
max_threads
values greater than one.
max_threads: 2
and a bit of tuning. high_performance_mode
if you have a xx90 card.high_memory_mode
can still lead to additional instability with threads at 2.max_threads: 2
in SD1.5-only setups without controlnets/post-processing or in other conservative configurations.rawpng
directly, reducing redundant operations.
PIL.Image.open(...)
was highly inefficient, especially for very large images.certifi
to resolve certificate resolution issues.download_models.py
would not exit if the compvis models failed to download. This would cause the worker to crash unexpectedly as it expects the models to be available on worker start.compose
(https://github.com/Haidra-Org/horde-worker-reGen/pull/328)Enhancement, Bug fix, Documentation, Tests
very_fast_disk_mode
configuration option for concurrent model loading.certifi
to resolve certificate issues.Relevant files | |||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Enhancement | 11 files
| ||||||||||||||||||||||
Tests | 1 files
| ||||||||||||||||||||||
Documentation | 1 files
| ||||||||||||||||||||||
Dependencies |
๐ก PR-Agent usage: Comment
/help "your question"
on any pull request to receive relevant information
The docker instructions aren't working for me (Arch Linux / nvidia GPU)
git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png
cd horde-worker-reGen-png/
git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml
docker compose -f Dockerfiles/compose.cuda.yaml build --pull
docker compose -f Dockerfiles/compose.cuda.yaml up -dV
reGen | [notice] A new release of pip is available: 24.0 -> 24.3.1
reGen | [notice] To update, run: pip install --upgrade pip
reGen | 2024-10-30 18:40:57.711 | DEBUG | horde_worker_regen.load_env_vars:load_env_vars_from_config:68 - Using default AI Horde URL.
reGen | 2024-10-30 18:40:57.740 | DEBUG | horde_sdk:_dev_env_var_warnings:42 - AIWORKER_CACHE_HOME is ./models/.
reGen | 2024-10-30 18:40:59.707 | DEBUG | horde_model_reference.legacy.classes.legacy_converters:write_out_records:554 - Converted database written to: /horde-worker-reGen/models/horde_model_reference/stable_diffusion.json
reGen | 2024-10-30 18:41:00.050 | DEBUG | horde_model_reference.legacy.classes.legacy_converters:write_out_records:554 - Converted database written to: /horde-worker-reGen/models/horde_model_reference/stable_diffusion.json
reGen | 2024-10-30 18:41:00.061 | WARNING | horde_worker_regen.bridge_data.data_model:validate_performance_modes:162 - High memory mode is enabled. You may experience performance issues with more than one thread.
reGen | 2024-10-30 18:41:00.061 | WARNING | horde_worker_regen.bridge_data.data_model:validate_performance_modes:167 - Please let us know if `unload_models_from_vram_often` improves or degrades performance with `high_memory_mode` enabled.
reGen | 2024-10-30 18:41:01.056 | WARNING | horde_model_reference.model_reference_records:validator_is_style_known:132 - Unknown style control_qr for model control_qr
reGen | 2024-10-30 18:41:01.056 | WARNING | horde_model_reference.model_reference_records:validator_is_style_known:132 - Unknown style control_qr_xl for model control_qr_xl
reGen | 2024-10-30 18:41:01.061 | DEBUG | horde_sdk.ai_horde_worker.model_meta:remove_large_models:155 - Removing cascade models: {'Stable Cascade 1.0'}
reGen | 2024-10-30 18:41:01.061 | DEBUG | horde_sdk.ai_horde_worker.model_meta:remove_large_models:156 - Removing flux models: {'Flux.1-Schnell fp16 (Compact)', 'Flux.1-Schnell fp8 (Compact)'}
reGen | /horde-worker-reGen/venv/lib/python3.11/site-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
reGen | warnings.warn(
reGen | 2024-10-30 18:41:02.834 | INFO | horde_safety.deep_danbooru_model:download_deep_danbooru_model:53 - Downloading DeepDanbooru model (~614 mb) to models/clip_blip/model-resnet_custom_v3.pt.
models/clip_blip/model-resnet_custom_v3.pt: 0% 0.00/644M [00:00<?, ?iB/s]2024-10-30 18:41:03.458 | INFO | horde_safety.deep_danbooru_model:download_deep_danbooru_model:63 - Model already downloaded.
reGen | 2024-10-30 18:41:03.458 | INFO | horde_safety.deep_danbooru_model:verify_deep_danbooru_model_hash:30 - Verifying SHA256 hash of downloaded file.
models/clip_blip/model-resnet_custom_v3.pt: 0% 0.00/644M [00:00<?, ?iB/s]
reGen | Loading CLIP model ViT-L-14/openai...
reGen | /horde-worker-reGen/venv/lib/python3.11/site-packages/open_clip/factory.py:372: UserWarning: These pretrained weights were trained with QuickGELU activation but the model config does not have that enabled. Consider using a model config with a "-quickgelu" suffix or enable with a flag.
reGen | warnings.warn(
reGen | Loaded CLIP model and data in 2.94 seconds.
reGen | 2024-10-30 18:41:06.832 | INFO | hordelib.comfy_horde:do_comfy_import:215 - Forcing normal vram mode
reGen | Traceback (most recent call last):
reGen | File "/horde-worker-reGen/download_models.py", line 25, in <module>
reGen | download_all_models(
reGen | File "/horde-worker-reGen/horde_worker_regen/download_models.py", line 58, in download_all_models
reGen | hordelib.initialise()
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/initialisation.py", line 81, in initialise
reGen | hordelib.comfy_horde.do_comfy_import(
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/comfy_horde.py", line 229, in do_comfy_import
reGen | import execution
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/execution.py", line 13, in <module>
reGen | import nodes
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/nodes.py", line 21, in <module>
reGen | import comfy.diffusers_load
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/diffusers_load.py", line 3, in <module>
reGen | import comfy.sd
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/sd.py", line 5, in <module>
reGen | from comfy import model_management
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/model_management.py", line 143, in <module>
reGen | total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
reGen | ^^^^^^^^^^^^^^^^^^
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/model_management.py", line 112, in get_torch_device
reGen | return torch.device(torch.cuda.current_device())
reGen | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 778, in current_device
reGen | _lazy_init()
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
reGen | torch._C._cuda_init()
reGen | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
The docker instructions aren't working for me (Arch Linux / nvidia GPU)
git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png cd horde-worker-reGen-png/ git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml docker compose -f Dockerfiles/compose.cuda.yaml build --pull docker compose -f Dockerfiles/compose.cuda.yaml up -dV
Do you have your system set up to make cuda work at all?
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Ironically getting nvidia to work inside docker is not as painless as AMD, due to their custom Kernel stuff https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html
I'm not sure what is and isn't required since I've not tested NVIDIA GPUs on Linux for a while, but you might need (some portion of) the cuda tooling installed locally.
The docker instructions aren't working for me (Arch Linux / nvidia GPU)
git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png cd horde-worker-reGen-png/ git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml docker compose -f Dockerfiles/compose.cuda.yaml build --pull docker compose -f Dockerfiles/compose.cuda.yaml up -dV
Do you have your system set up to make cuda work at all?
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Ironically getting nvidia to work inside docker is not as painless as AMD, due to their custom Kernel stuff https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html
Yes. In fact, I created my own Dockerfile before I knew this branch existed, and it's running fine on my system as we speak. So I'm also stumped. I can dive a bit more into comparing the two containers to figure out what's going on.
docker run --rm --gpus all ubuntu nvidia-smi --query-gpu=name --format=csv,noheader
NVIDIA GeForce RTX 4090
Yes. In fact, I created my own Dockerfile before I knew this branch existed, and it's running fine on my system as we speak. So I'm also stumped. I can dive a bit more into comparing the two containers to figure out what's going on.
Please do. I haven't had much to do with the creation of the Dockerfile.cuda
and @tazlin found it to be working iirc. but the compose.cuda.yaml
is a complete blindshot based on what worked for AMD and what I found online.
There might very well be a few issues with that, especially around exposing the GPU to the container.
There might very well be a few issues with that, especially around exposing the GPU to the container.
Good call. I compared the two docker-compose.yml
files, and found that the gpu configurations were ever so slightly different. With count: all
added here, now the error is gone.
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
count: all
I think you can just create a small separate PR to be merged into raw-png
(not main
).
This wouldn't fit anything I have open, shouldn't conflict with much else either and you should be the one credited for fixing what was broken.
I think you can just create a small separate PR to be merged into
raw-png
(notmain
). This wouldn't fit anything I have open, shouldn't conflict with much else either and you should be the one credited for fixing what was broken.
Done. https://github.com/Haidra-Org/horde-worker-reGen/pull/334
@CodiumAI-Agent /review
Here are some key observations to aid the review process:
**๐ซ Ticket compliance analysis ๐ถ** **[333](https://github.com/Haidra-Org/horde-worker-reGen/issues/333) - Partially compliant** Fully compliant requirements: - Update PyTorch version to 2.5.0 without breaking older setups. - Skip installing flash_attn on compatible cards if `FLASH_ATTENTION_USE_TRITON_ROCM=FALSE`. Not compliant requirements: - Test 256 head dimensions for potential use in FLUX.1. - Test if Triton makes the use of flash_attn possible on older RDNA cards. **[334](https://github.com/Haidra-Org/horde-worker-reGen/issues/334) - Fully compliant** Fully compliant requirements: - Ensure the `count: all` setting is included to prevent CUDA unavailability. **[335](https://github.com/Haidra-Org/horde-worker-reGen/issues/335) - Fully compliant** Fully compliant requirements: - Use SIGINT to stop the docker container allowing graceful shutdown. |
โฑ๏ธ Estimated effort to review: 4 ๐ต๐ต๐ต๐ตโช |
๐งช No relevant tests |
๐ No security concerns identified |
โก Recommended focus areas for review Possible Bug The method `on_process_ending` is introduced, replacing `on_process_ended`. Ensure that this change is reflected everywhere in the codebase and that it does not introduce any new issues. Performance Issue The `sdpa_hijack` function now supports 256 head dimensions. Performance implications of this change should be reviewed, especially under different configurations. |
New Features/Updates
very_fast_disk_mode
configuration option for concurrent model loading.false
.very_fast_disk_mode: false
to only load one model at a time when it is being explicitly preloaded. There are some cases where it still might attempt to load more than one but it should be far less often.Fixes and Improvements
flash_attn
.high_performance_mode
.max_threads
values greater than one.max_threads: 2
and a bit of tuning.high_performance_mode
if you have a xx90 card.high_memory_mode
can still lead to additional instability with threads at 2.max_threads: 2
in SD1.5-only setups without controlnets/post-processing or in other conservative configurations.rawpng
directly, reducing redundant operations.PIL.Image.open(...)
was highly inefficient, especially for very large images.certifi
to resolve certificate resolution issues.download_models.py
would not exit if the compvis models failed to download. This would cause the worker to crash unexpectedly as it expects the models to be available on worker start.Dockerfiles/README.md
for information on configuring these images.Developer changes
feat: add ROCm and CUDA Dockerfiles with entrypoint and setup scripts
compose
(https://github.com/Haidra-Org/horde-worker-reGen/pull/328)333
334
335