Haidra-Org / horde-worker-reGen

The default client software to create images for the AI-Horde
https://aihorde.net/
GNU Affero General Public License v3.0
93 stars 42 forks source link

feat: latest torch/comfyui; perf improvments; fix: SSL cert issues #309

Closed tazlin closed 1 week ago

tazlin commented 1 month ago

New Features/Updates

Fixes and Improvements

Developer changes

feat: add ROCm and CUDA Dockerfiles with entrypoint and setup scripts


tazlin commented 1 month ago

@CodiumAI-Agent /describe

CodiumAI-Agent commented 1 month ago

Title

(Describe updated until commit https://github.com/Haidra-Org/horde-worker-reGen/commit/f1e88afdbd9c40295a4608a1d434bc8e79022852)

feat: latest torch/comfyui; perf improvments; fix: SSL cert issues


User description

New Features/Updates

Fixes and Improvements

Developer changes

feat: add ROCm and CUDA Dockerfiles with entrypoint and setup scripts



PR Type

Enhancement, Bug fix, Documentation, Tests


Description


Changes walkthrough ๐Ÿ“

Relevant files
Enhancement
11 files
__init__.py
Bump version to 9.2.0                                                                       

horde_worker_regen/__init__.py - Updated version from 9.0.7 to 9.2.0.
+1/-1     
amd_go_fast.py
Enhance scaled dot product attention hijack                           

horde_worker_regen/amd_go_fast/amd_go_fast.py
  • Modified sdpa_hijack function to include enable_gqa parameter.
  • Increased query shape threshold from 128 to 256.
  • +5/-2     
    data_model.py
    Add very_fast_disk_mode configuration option                         

    horde_worker_regen/bridge_data/data_model.py - Added `very_fast_disk_mode` configuration option.
    +3/-0     
    load_env_vars.py
    Load large models environment variable setup                         

    horde_worker_regen/load_env_vars.py - Added logic to set environment variable for loading large models.
    +5/-0     
    inference_process.py
    Improve model preloading and image processing                       

    horde_worker_regen/process_management/inference_process.py
  • Improved logging for model preloading.
  • Optimized image processing by using rawpng directly.
  • +4/-6     
    process_manager.py
    Enhance process management and SSL handling                           

    horde_worker_regen/process_management/process_manager.py
  • Added SSL context using certifi.
  • Enhanced process management with better deadlock detection.
  • Added properties for RAM in megabytes and gigabytes.
  • +210/-26
    entrypoint.sh
    Add Docker entrypoint script for setup and execution         

    Dockerfiles/entrypoint.sh
  • Added entrypoint script for Docker setup.
  • Handles environment setup and worker execution.
  • +58/-0   
    setup_rocm.sh
    Add ROCm environment setup script                                               

    Dockerfiles/setup_rocm.sh
  • Added script to uninstall NVIDIA-specific packages in ROCm
    environment.
  • +5/-0     
    Dockerfile.cuda
    Add Dockerfile for CUDA environment                                           

    Dockerfiles/Dockerfile.cuda
  • Added Dockerfile for CUDA environment setup.
  • Supports multi-stage builds and dependency installation.
  • +62/-0   
    Dockerfile.rocm
    Add Dockerfile for ROCm environment                                           

    Dockerfiles/Dockerfile.rocm
  • Added Dockerfile for ROCm environment setup.
  • Supports multi-stage builds and dependency installation.
  • +69/-0   
    bridgeData_template.yaml
    Update bridgeData template with new option                             

    bridgeData_template.yaml - Added `very_fast_disk_mode` option to template.
    +4/-0     
    Tests
    1 files
    test_horde_dep_updates.py
    Add logging for torch version check skips                               

    tests/test_horde_dep_updates.py - Added logger warnings for skipping torch version checks.
    +14/-0   
    Documentation
    1 files
    README.md
    Add Docker usage guide for CUDA and ROCm                                 

    Dockerfiles/README.md
  • Added detailed guide for using Dockerfiles with CUDA and ROCm.
  • Includes setup, configuration, and troubleshooting instructions.
  • +230/-0 
    Dependencies
    2 files
    requirements.txt
    Update dependencies to latest versions                                     

    requirements.txt
  • Updated torch version to 2.5.0.
  • Updated horde dependencies to latest versions.
  • +8/-6     
    .pre-commit-config.yaml
    Update pre-commit hooks                                                                   

    .pre-commit-config.yaml - Updated pre-commit hooks to latest versions.
    +8/-8     

    ๐Ÿ’ก PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    CIB commented 3 weeks ago

    The docker instructions aren't working for me (Arch Linux / nvidia GPU)

    git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png
    cd horde-worker-reGen-png/
    git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml
    docker compose -f Dockerfiles/compose.cuda.yaml build --pull
    docker compose -f Dockerfiles/compose.cuda.yaml up -dV
    reGen  | [notice] A new release of pip is available: 24.0 -> 24.3.1
    reGen  | [notice] To update, run: pip install --upgrade pip
    reGen  | 2024-10-30 18:40:57.711 | DEBUG    | horde_worker_regen.load_env_vars:load_env_vars_from_config:68 - Using default AI Horde URL.
    reGen  | 2024-10-30 18:40:57.740 | DEBUG    | horde_sdk:_dev_env_var_warnings:42 - AIWORKER_CACHE_HOME is ./models/.
    reGen  | 2024-10-30 18:40:59.707 | DEBUG    | horde_model_reference.legacy.classes.legacy_converters:write_out_records:554 - Converted database written to: /horde-worker-reGen/models/horde_model_reference/stable_diffusion.json
    reGen  | 2024-10-30 18:41:00.050 | DEBUG    | horde_model_reference.legacy.classes.legacy_converters:write_out_records:554 - Converted database written to: /horde-worker-reGen/models/horde_model_reference/stable_diffusion.json
    reGen  | 2024-10-30 18:41:00.061 | WARNING  | horde_worker_regen.bridge_data.data_model:validate_performance_modes:162 - High memory mode is enabled. You may experience performance issues with more than one thread.
    reGen  | 2024-10-30 18:41:00.061 | WARNING  | horde_worker_regen.bridge_data.data_model:validate_performance_modes:167 - Please let us know if `unload_models_from_vram_often` improves or degrades performance with `high_memory_mode` enabled.
    reGen  | 2024-10-30 18:41:01.056 | WARNING  | horde_model_reference.model_reference_records:validator_is_style_known:132 - Unknown style control_qr for model control_qr
    reGen  | 2024-10-30 18:41:01.056 | WARNING  | horde_model_reference.model_reference_records:validator_is_style_known:132 - Unknown style control_qr_xl for model control_qr_xl
    reGen  | 2024-10-30 18:41:01.061 | DEBUG    | horde_sdk.ai_horde_worker.model_meta:remove_large_models:155 - Removing cascade models: {'Stable Cascade 1.0'}
    reGen  | 2024-10-30 18:41:01.061 | DEBUG    | horde_sdk.ai_horde_worker.model_meta:remove_large_models:156 - Removing flux models: {'Flux.1-Schnell fp16 (Compact)', 'Flux.1-Schnell fp8 (Compact)'}
    reGen  | /horde-worker-reGen/venv/lib/python3.11/site-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    reGen  |   warnings.warn(
    reGen  | 2024-10-30 18:41:02.834 | INFO     | horde_safety.deep_danbooru_model:download_deep_danbooru_model:53 - Downloading DeepDanbooru model (~614 mb) to models/clip_blip/model-resnet_custom_v3.pt.
    models/clip_blip/model-resnet_custom_v3.pt:   0% 0.00/644M [00:00<?, ?iB/s]2024-10-30 18:41:03.458 | INFO     | horde_safety.deep_danbooru_model:download_deep_danbooru_model:63 - Model already downloaded.
    reGen  | 2024-10-30 18:41:03.458 | INFO     | horde_safety.deep_danbooru_model:verify_deep_danbooru_model_hash:30 - Verifying SHA256 hash of downloaded file.
    models/clip_blip/model-resnet_custom_v3.pt:   0% 0.00/644M [00:00<?, ?iB/s]
    reGen  | Loading CLIP model ViT-L-14/openai...
    reGen  | /horde-worker-reGen/venv/lib/python3.11/site-packages/open_clip/factory.py:372: UserWarning: These pretrained weights were trained with QuickGELU activation but the model config does not have that enabled. Consider using a model config with a "-quickgelu" suffix or enable with a flag.
    reGen  |   warnings.warn(
    reGen  | Loaded CLIP model and data in 2.94 seconds.
    reGen  | 2024-10-30 18:41:06.832 | INFO     | hordelib.comfy_horde:do_comfy_import:215 - Forcing normal vram mode
    reGen  | Traceback (most recent call last):
    reGen  |   File "/horde-worker-reGen/download_models.py", line 25, in <module>
    reGen  |     download_all_models(
    reGen  |   File "/horde-worker-reGen/horde_worker_regen/download_models.py", line 58, in download_all_models
    reGen  |     hordelib.initialise()
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/initialisation.py", line 81, in initialise
    reGen  |     hordelib.comfy_horde.do_comfy_import(
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/comfy_horde.py", line 229, in do_comfy_import
    reGen  |     import execution
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/execution.py", line 13, in <module>
    reGen  |     import nodes
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/nodes.py", line 21, in <module>
    reGen  |     import comfy.diffusers_load
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/diffusers_load.py", line 3, in <module>
    reGen  |     import comfy.sd
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/sd.py", line 5, in <module>
    reGen  |     from comfy import model_management
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/model_management.py", line 143, in <module>
    reGen  |     total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
    reGen  |                                   ^^^^^^^^^^^^^^^^^^
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/model_management.py", line 112, in get_torch_device
    reGen  |     return torch.device(torch.cuda.current_device())
    reGen  |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 778, in current_device
    reGen  |     _lazy_init()
    reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    reGen  |     torch._C._cuda_init()
    reGen  | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
    HPPinata commented 3 weeks ago

    The docker instructions aren't working for me (Arch Linux / nvidia GPU)

    git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png
    cd horde-worker-reGen-png/
    git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml
    docker compose -f Dockerfiles/compose.cuda.yaml build --pull
    docker compose -f Dockerfiles/compose.cuda.yaml up -dV

    Do you have your system set up to make cuda work at all? sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

    Ironically getting nvidia to work inside docker is not as painless as AMD, due to their custom Kernel stuff https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html

    HPPinata commented 3 weeks ago

    I'm not sure what is and isn't required since I've not tested NVIDIA GPUs on Linux for a while, but you might need (some portion of) the cuda tooling installed locally.

    CIB commented 3 weeks ago

    The docker instructions aren't working for me (Arch Linux / nvidia GPU)

    git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png
    cd horde-worker-reGen-png/
    git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml
    docker compose -f Dockerfiles/compose.cuda.yaml build --pull
    docker compose -f Dockerfiles/compose.cuda.yaml up -dV

    Do you have your system set up to make cuda work at all? sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

    Ironically getting nvidia to work inside docker is not as painless as AMD, due to their custom Kernel stuff https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html

    Yes. In fact, I created my own Dockerfile before I knew this branch existed, and it's running fine on my system as we speak. So I'm also stumped. I can dive a bit more into comparing the two containers to figure out what's going on.

    docker run --rm --gpus all ubuntu nvidia-smi --query-gpu=name --format=csv,noheader
    NVIDIA GeForce RTX 4090
    HPPinata commented 3 weeks ago

    Yes. In fact, I created my own Dockerfile before I knew this branch existed, and it's running fine on my system as we speak. So I'm also stumped. I can dive a bit more into comparing the two containers to figure out what's going on.

    Please do. I haven't had much to do with the creation of the Dockerfile.cuda and @tazlin found it to be working iirc. but the compose.cuda.yaml is a complete blindshot based on what worked for AMD and what I found online. There might very well be a few issues with that, especially around exposing the GPU to the container.

    CIB commented 3 weeks ago

    There might very well be a few issues with that, especially around exposing the GPU to the container.

    Good call. I compared the two docker-compose.yml files, and found that the gpu configurations were ever so slightly different. With count: all added here, now the error is gone.

        deploy:
          resources:
            reservations:
              devices:
              - driver: nvidia
                capabilities: [gpu]
                count: all
    HPPinata commented 3 weeks ago

    I think you can just create a small separate PR to be merged into raw-png (not main). This wouldn't fit anything I have open, shouldn't conflict with much else either and you should be the one credited for fixing what was broken.

    CIB commented 3 weeks ago

    I think you can just create a small separate PR to be merged into raw-png (not main). This wouldn't fit anything I have open, shouldn't conflict with much else either and you should be the one credited for fixing what was broken.

    Done. https://github.com/Haidra-Org/horde-worker-reGen/pull/334

    tazlin commented 2 weeks ago

    @CodiumAI-Agent /review

    CodiumAI-Agent commented 2 weeks ago

    PR Reviewer Guide ๐Ÿ”

    (Review updated until commit https://github.com/Haidra-Org/horde-worker-reGen/commit/50c53346fe626c86668a88356a0026f9f4dc7e04)

    Here are some key observations to aid the review process:

    **๐ŸŽซ Ticket compliance analysis ๐Ÿ”ถ** **[333](https://github.com/Haidra-Org/horde-worker-reGen/issues/333) - Partially compliant** Fully compliant requirements: - Update PyTorch version to 2.5.0 without breaking older setups. - Skip installing flash_attn on compatible cards if `FLASH_ATTENTION_USE_TRITON_ROCM=FALSE`. Not compliant requirements: - Test 256 head dimensions for potential use in FLUX.1. - Test if Triton makes the use of flash_attn possible on older RDNA cards. **[334](https://github.com/Haidra-Org/horde-worker-reGen/issues/334) - Fully compliant** Fully compliant requirements: - Ensure the `count: all` setting is included to prevent CUDA unavailability. **[335](https://github.com/Haidra-Org/horde-worker-reGen/issues/335) - Fully compliant** Fully compliant requirements: - Use SIGINT to stop the docker container allowing graceful shutdown.
    โฑ๏ธ Estimated effort to review: 4 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšช
    ๐Ÿงช No relevant tests
    ๐Ÿ”’ No security concerns identified
    โšก Recommended focus areas for review

    Possible Bug
    The method `on_process_ending` is introduced, replacing `on_process_ended`. Ensure that this change is reflected everywhere in the codebase and that it does not introduce any new issues. Performance Issue
    The `sdpa_hijack` function now supports 256 head dimensions. Performance implications of this change should be reviewed, especially under different configurations.
    CodiumAI-Agent commented 1 week ago

    Persistent review updated to latest commit https://github.com/Haidra-Org/horde-worker-reGen/commit/50c53346fe626c86668a88356a0026f9f4dc7e04