Tencent / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
https://dit.hunyuan.tencent.com/
Other
2.59k stars 180 forks source link

feat: Add Compatibility for Windows #101

Open C0nsumption opened 1 week ago

C0nsumption commented 1 week ago

Windows?

feat: Add Compatibility for Windows

Description

This pull request introduces several changes to ensure compatibility with Windows and the most recent versions of various modules. The following modifications have been made:

  1. Module Upgrades:

    • Upgraded most modules to their latest versions, including torch, diffusers, etc.
  2. Downgrade NumPy:

    • Downgraded NumPy to a version below 2.0.0 to avoid compatibility issues with modules compiled using NumPy 1.x.
      pip install "numpy<2.0.0"
  3. Conditional Import and Usage of deepspeed:

    • Added a try-except block to conditionally import deepspeed and add its related arguments only if deepspeed is available.
    • This avoids potential ImportError and makes the script more robust.
  4. Enhance Image-Saving Logic:

    • Introduced the get_next_index function to handle non-integer filenames during the image-saving process.
    • This change ensures that the script can handle filenames that do not conform to an integer pattern without breaking.

Installation Instructions

  1. Clone the Repository:

    git clone https://github.com/tencent/HunyuanDiT
    cd HunyuanDiT
  2. Install Dependencies:

    Using a Virtual Environment

    python -m venv venv
    venv\Scripts\activate
    pip install "huggingface_hub[cli]"
    mkdir ckpts
    huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts
    
    python.exe -m pip install --upgrade pip
    
    pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121   
    pip install loguru
    pip install diffusers
    pip install transformers
    pip install timm
    pip install einops
    pip install peft
    pip install sentencepiece
    pip install protobuf
    pip install "numpy<2.0.0"    

    Note: These versions are not the ones used in the requirements.txt, but they allow the use of CUDA 12.1 and the newest versions of diffusers and pytorch. (Same as ComfyUI I believe)

    Using Conda for Closer Compatibility with Original Repository

    conda create -n HunyuanDit python=3.9 -c conda-forge -y
    conda activate HunyuanDit
    
    # Install CUDA 11.7 from conda-forge
    conda install cudatoolkit=11.7 -c conda-forge
    
    set CUDA_HOME=%CONDA_PREFIX%
    set PATH=%CUDA_HOME%\bin;%PATH%
    
    python -m pip install --upgrade pip
    pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
    pip install loguru==0.7.2
    pip install diffusers==0.21.2
    pip install timm==0.9.5
    pip install einops==0.7.0
    pip install transformers==4.39.1
    pip install peft==0.10.0
    pip install "numpy<2.0"
    pip install sentencepiece==0.1.99
    pip install setuptools==65.5.1
    pip install protobuf==3.19.0
    pip install wheel
    pip install packaging
  3. Run Inference:

    python sample_t2i.py --prompt "a woman" --no-enhance

WHY?

Error Encountered

Solution

  1. Downgrade NumPy:

    pip install "numpy<2.0.0"
  2. Changes in hydit/modules/models.py:

    use_flash_attn = args.infer_mode == 'fa' or getattr(args, 'use_flash_attn', False)
    • Explanation: This function call attempts to access args.use_flash_attn. If the attribute does not exist, it returns False instead of raising an AttributeError. This maintains the intended behavior without causing an error.
  3. Changes in hydit/config.py:

    try:
        import deepspeed
    
        # Add DeepSpeed-specific arguments
        parser = deepspeed.add_config_arguments(parser)
        parser.add_argument('--local_rank', type=int, default=None,
                            help='local rank passed from distributed launcher.')
        parser.add_argument('--deepspeed-optimizer', action='store_true',
                            help='Switching to the optimizers in DeepSpeed')
        parser.add_argument('--remote-device', type=str, default='none', choices=['none', 'cpu', 'nvme'],
                            help='Remote device for ZeRO-3 initialized parameters.')
        parser.add_argument('--zero-stage', type=int, default=1)
        parser.add_argument("--async-ema", action="store_true", help="Whether to use multi stream to excut EMA.")
    except ImportError:
        print("DeepSpeed not available. Skipping related arguments...")
    • Explanation: Added a try-except block to conditionally import deepspeed and add its related arguments only if deepspeed is available. This ensures that deepspeed related arguments are only included when deepspeed is successfully imported, avoiding potential ImportError and making the script more robust.
  4. Changes in sample_t2i.py:

    def get_next_index(save_dir):
        all_files = list(save_dir.glob('*.png'))
        indices = []
        for f in all_files:
            try:
                indices.append(int(f.stem))
            except ValueError:
                logger.warning(f"Skipping file with non-integer name: {f}")
        return max(indices, default=-1) + 1
    
    # Find the first available index
    start = get_next_index(save_dir)
    • Explanation: Introduced the get_next_index function to handle non-integer filenames gracefully during the image-saving process. This ensures that the script can handle filenames that do not conform to an integer pattern without breaking.

Testing

Known Bugs

jroubi commented 5 days ago

Which python version did you used. In the conda environment it is explicitly said 3.9

But did you use the last version of python for the virtual env ?

C0nsumption commented 2 days ago

Which python version did you used. In the conda environment it is explicitly said 3.9

But did you use the last version of python for the virtual env ?

3.10.9

My bad for the delay bud, no github notification. Think it only comes up when quoted. But honestly if you are just doing inference my assumption is you should be fine even with 3.11 and maybe 3.12. Cause all the errors that pop up without these changes are all training related. It has nothing to do with actual inference.