erew123 / deepspeedpatcher

A graphical tool to simplify building and installing DeepSpeed 0.15.x or later on Windows systems.
1 stars 0 forks source link

DeepSpeed Windows Patcher

A graphical tool to simplify building and installing DeepSpeed on Windows systems. This tool automates the build process, manages dependencies, and provides clear guidance for CUDA setup. FYI, this was going to be a patcher for earlier builds of DeepSpeed, but I moved the code to be a builder for DeepSpeed 0.15.x and later, but Im too lazy to change the name.

NOTE: If you cannot use the tool for some reason, the manual instructions to perform are at the bottom of this page Manual Builds of DeepSpeed 0.15.0 and later

Table of Contents


image


Important Version Compatibility Information

DeepSpeed wheels are environment-specific and must be built for your exact configuration. A built wheel is tied to:

  1. Python Major Version

    • Must match exactly (e.g., 3.10.x, 3.11.x)
    • A wheel built for Python 3.10 won't work on Python 3.11 or 3.9
  2. PyTorch Version

    • Must match the major and minor version used during build
    • A wheel built with PyTorch 2.2.1 won't work with PyTorch 2.3.0
    • A wheel built with PyTorch 2.1.0 won't work with PyTorch 2.2.0
  3. CUDA Versions (There are TWO different CUDA versions to consider):

    a) PyTorch CUDA Version

    • This is the CUDA version that PyTorch was built with
    • Can be checked with torch.version.cuda
    • This is determined when you install PyTorch
    • Example: PyTorch 2.1.0+cu121 uses CUDA 12.1

    b) NVIDIA CUDA Toolkit Version

    • This is the full CUDA development toolkit installed on your system
    • Used for compiling DeepSpeed's CUDA extensions
    • Must be installed separately from PyTorch
    • Should be compatible with (same or newer than) your PyTorch CUDA version

    For example:

    • If PyTorch uses CUDA 11.8 → NVIDIA CUDA Toolkit should be 11.8 or higher
    • If PyTorch uses CUDA 12.1 → NVIDIA CUDA Toolkit should be 12.1 or higher

Version Compatibility Example

Environment A (where wheel was built):
- Python 3.11.5
- PyTorch 2.1.0+cu121 (CUDA 12.1)
- NVIDIA CUDA Toolkit 12.1

This wheel will ONLY work in environments with:
- Python 3.11.x (any minor version of 3.11)
- PyTorch 2.1.x (must match major.minor version)
- PyTorch built with CUDA 12.1
- NVIDIA CUDA Toolkit 12.1 or higher

Common Compatibility Issues

Purpose

Building DeepSpeed on Windows can be challenging due to specific requirements and environment setup needs. This tool:

System Requirements

Mandatory Prerequisites

  1. NVIDIA GPU with CUDA Support

    • Compatible NVIDIA GPU
    • NVIDIA Display Driver installed
  2. NVIDIA CUDA Toolkit

    • Version 11.0 or later required
    • Must include:
      • CUDA Development Compiler (nvcc)
      • CUDA Development Libraries (CUBLAS)
      • CUDA Runtime Libraries
    • Download from: NVIDIA CUDA Toolkit Archive
  3. Visual Studio with C++ Build Tools

    • Visual Studio 2019 or 2022 (Community Edition or BuildTools)
    • Must include "Desktop development with C++"
    • Download from: Visual Studio Community
  4. Python Environment

    • Python 3.8 or later
    • PyTorch installed with CUDA support
    • Admin rights for installation (This is a Microsoft requirement for DeepSpeed builds).

Python Package Dependencies

Download and Usage

Getting the Tool

  1. Clone the repository:
    git clone https://github.com/erew123/deepspeedpatcher

Before Running the Tool

⚠️ IMPORTANT: Complete these steps in order:

  1. Install Visual Studio or Visual Studio Build Tools

    • Install Visual Studio 2019/2022 Community or Build Tools
    • During installation, select "Desktop development with C++"
      • ✅ MSVC v142 - VS 2019 C++ x64/x86 build tools (Latest version is fine)
      • ✅ Windows 10 or 11 SDK (Any recent version, so 10.0.19041.0 or newer is fine)
      • ✅ C++ core features - Build Tools
      • ✅ C++/CLI support for v142 build tools
      • ✅ C++ Modules for v142 build tools
      • ✅ C++ CMake tools for Windows
    • This must be done before proceeding
  2. Install NVIDIA CUDA Toolkit

    • Install the appropriate version based on your PyTorch's CUDA version
    • Example: For PyTorch with CUDA 12.1, install CUDA Toolkit 12.1
    • Must include development components (nvcc compiler)
    • Only specific components should be required:
      • ✅ CUDA > Development > Compiler > nvcc
      • ✅ CUDA > Development > Libraries > CUBLAS
      • ✅ CUDA > Runtime > Libraries > CUBLAS
  3. Set Up Python Environment

    • Create and activate your Python environment (venv, conda, etc.)
    • Install PyTorch with the desired CUDA version
    • Verify PyTorch CUDA is working:
      python -c "import torch; print(f'PyTorch CUDA available: {torch.cuda.is_available()}, Version: {torch.version.cuda}')"

Running the Tool

  1. Activate Your Target Environment

    # If using venv
    .\venv\Scripts\activate
    
    # If using conda
    conda activate your_environment_name
  2. Launch the Tool

    python builddeepspeed.py
    • The tool will request administrative privileges
    • It will perform system checks automatically

Important Notes

Verification After Building

After building and installing the wheel to your Python environment, verify DeepSpeed with ds_report or:

python -c "import deepspeed; deepspeed.show_env()"

Common Setup Mistakes to Avoid

  1. Wrong Environment Active

    • Building a wheel in one environment but trying to use in another that doesnt match
    • Not activating the target environment before running the tool
    • Not setting the CUDA_HOME environment after installing the wheel and starting your Python environment
  2. Incorrect Order of Installation

    • Installing CUDA Toolkit after PyTorch
    • Not having Visual Studio/Build Tools installed first
  3. Version Mismatches

    • PyTorch CUDA version doesn't match CUDA Toolkit
    • Using wrong Python version for your needs

Quick Environment Check

Run these commands before starting to verify your setup (these checks are performed by the tool on start-up):

# Check Python version
python --version

# Check PyTorch and its CUDA version
python -c "import torch; print(f'PyTorch {torch.__version__}, CUDA {torch.version.cuda if torch.cuda.is_available() else "Not Available"}')"

# Check CUDA Toolkit
nvcc --version

All these components must be correctly installed and compatible before running the tool.

Startup Checks

The tool performs several verification steps on launch:

  1. Administrative Rights Check

    • Verifies admin privileges
    • Offers to restart with elevated privileges if needed
    • Microsoft's routine for compiling DeepSpeed requires Admin rights.
  2. Visual Studio Detection

    • Checks for VS2019/VS2022 installations
    • Verifies presence of C++ build tools
  3. CUDA Toolkit Verification

    • Scans for installed CUDA versions
    • Verifies nvcc compiler availability
    • Checks CUBLAS presence
  4. Python Environment Check

    • Verifies Python version
    • Checks PyTorch installation and CUDA availability
    • Auto-installs missing dependencies (except PyTorch)

Usage Guide

Building DeepSpeed

  1. Launch the Application

    • Run as administrator
    • The tool will perform initial system checks
  2. Configure Build Settings

    • Select DeepSpeed version
    • Choose CUDA version
    • Set installation directory
    • Configure build options (typically left unchecked for Windows)
  3. Build Options

    • Build Only: Creates wheel file without installation
    • Install Built Wheel: Installs previously built wheel
    • Build and Install: Performs both operations
    • CUDA_HOME Setup Guide: Shows CUDA_HOME configuration instructions

Build Management

The tool manages builds in an organized way:

  1. Work Directory Structure

    root/
    ├── deepspeed/          # Temporary build directory
    └── deepspeed_wheels/   # Archive directory
       └── deepspeed_[version]_cuda[version]_py[version]/
           └── wheelfile.whl
  2. Cleanup Process

    • Automatically cleans build directory before each build
    • Preserves built wheels in version-specific archives
    • Maintains separate directories for different configurations

Configuration File (deepspeed_config.json)

The tool uses a JSON configuration file to manage available versions:

{
    "versions": {
        "0.15.0": {
            "url": "https://github.com/microsoft/DeepSpeed/archive/refs/tags/v0.15.0.zip",
            "cuda_min": "11.0"
        },
        "0.15.1": {
            "url": "https://github.com/microsoft/DeepSpeed/archive/refs/tags/v0.15.1.zip",
            "cuda_min": "11.0"
        }
    }
}

Adding New Versions

To add support for new DeepSpeed versions:

  1. Add a new entry to the "versions" object
  2. Specify the GitHub release URL
  3. Set minimum CUDA version requirement

Note: This tool supports DeepSpeed 0.15.0 and later. Earlier versions may have different build requirements and are not supported.

Additional Notes

  1. CUDA_HOME Environment

    • The tool provides guidance for setting CUDA_HOME
    • Different options for system-wide, conda, and venv setups
    • Verification steps included
  2. Build Artifacts

    • Wheels are archived with version information
    • Each build creates a clean environment
    • Previous builds are preserved in the archive directory
  3. Error Handling

    • Detailed error messages in the log
    • Suggestions for common issues
    • Build process can be retried if needed
  4. Log File

    • All operations are logged to 'deepspeed_build.log'
    • Includes timestamps and detailed progress information
    • Useful for troubleshooting

Troubleshooting

  1. Build Failures

    • Verify CUDA installation completeness
    • Check Visual Studio installation
    • Ensure PyTorch is installed with CUDA support
    • Review log file for specific errors
  2. CUDA Issues

    • Confirm CUDA_HOME is set correctly
    • Verify nvcc.exe is accessible
    • Check CUDA version compatibility with PyTorch
  3. Installation Issues

    • Run as administrator
    • Check Python environment isolation
    • Verify all prerequisites are met

Notes for Developers

Manual Builds of DeepSpeed 0.15.0 and later

Building DeepSpeed on Windows - Key Requirements and Observations

Required Software

Note: If using Visual Studio Code instead of Visual Studio, make sure you've installed the actual Visual Studio Build Tools separately - VS Code's C++ extension default selection alone is not sufficient.

CUDA Toolkit Installation

Visual Studio Environment Differences

  1. Visual Studio (Full Version):

    • Developer Command Prompt available from Start Menu
    • VS environment variables automatically set
    • 64-bit toolchain readily available
  2. Visual Studio Code:

    • No built-in Developer Command Prompt
    • Must manually run vcvars64.bat from: C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvars64.bat
    • Important: Must use 64-bit environment (vcvars64.bat, not vcvars32.bat)

Key Findings and Gotchas

Environment Setup is Critical:

Build Process:

Common Issues:

Manual Build Steps

After downloading and extracting a DeepSpeed 0.15.x (or later) build, open a Visual Studio x64 Developer Command Prompt as Administrator (or initialize VS environment manually) and then start the desired Python environment you want to build for.

You will need to copy/paste the following commands into the command prompt window.

Set required environment variables to point to your Nvidia CUDA Toolkit just before the bin path where nvcc.exe is located (example below, but you need to confirm your paths to the correct location):

set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
set DISTUTILS_USE_SDK=1

You can verify environment is correctly set by testing NVCC and also CL:

nvcc -V    # Should show CUDA version
cl         # Should show MSVC version

Set DeepSpeed build options to 0

set DS_BUILD_AIO=0
set DS_BUILD_CUTLASS_OPS=0
set DS_BUILD_EVOFORMER_ATTN=0
set DS_BUILD_FP_QUANTIZER=0
set DS_BUILD_RAGGED_DEVICE_OPS=0
set DS_BUILD_SPARSE_ATTN=0

If necessary, you can check your environment variables by running set at the command prompt you are in to see a list of all environment variables that have been set.

You should now be in your Administrative x64 Developer command prompt/console, with your Python environment that you want to build DeepSpeed for loaded up, your CUDA_HOME and CUDA_PATH set correctly as well as your DeepSpeed build options set, so you can move into your extracted DeepSpeed folder run:

python setup.py bdist_wheel

When this has completed, if successful, there will be a dist folder within the DeepSpeed folder that contains your compiled wheel. You can install this with pip install deepspeed-xxxxxxxxxx.whl where the x's will be unique to your environment build.

Verification

After building and installing the wheel, verify CUDA availability with ds_report or a Python Script:

import torch
print(torch.cuda.is_available())
import deepspeed
from deepspeed.ops.op_builder import OpBuilder

Additional Notes

Build process needs VS2019 or newer (VS2019 Build Tools minimum requirement)
CUDA version must be compatible with installed PyTorch version
Wheel files are specific to Python version, CUDA version, and Windows architecture
Building in a clean directory prevents potential conflicts
Set your CUDA_HOME path: DeepSpeed needs to be able to find and access the Nvidia CUDA Toolkit's nvcc.exe each time DeepSpeed starts up. As such, each and every Python environment you are running that uses DeepSpeed will need its CUDA_HOME environment path variable to be set correctly. Guides on doing this are in the tool.

Manual Instructions Flowchart

flowchart TD
    A[Start Manual Build Process] --> B[Prerequisites Check]

    B --> C[Visual Studio Requirements]
    C --> C1[VS2019 or newer with C++ Workload]
    C1 --> C2[Required Components Check]
    C2 --> C3["MSVC v142 Build Tools
    Windows 10 SDK
    C++ Core Features
    C++/CLI Support
    C++ Modules
    C++ CMake Tools"]

    B --> D[CUDA Toolkit Requirements]
    D --> D1[Install Required Components]
    D1 --> D2["CUDA Compiler (nvcc)
    CUBLAS Development
    CUBLAS Runtime"]

    B --> E[Python Requirements]
    E --> E1["Compatible PyTorch CUDA
    Ninja Build System
    psutil"]

    C3 & D2 & E1 --> F[Environment Setup]

    F --> G[Launch VS x64 Developer Command Prompt as Administrator]

    G --> I[Environment Variables Required]
    I --> I1["CUDA_HOME
    CUDA_PATH
    DISTUTILS_USE_SDK"]

    I1 --> I2["Instructions:
    set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
    set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
    set DISTUTILS_USE_SDK=1"]

    I2 --> J[DeepSpeed Build Options Required]
    J --> J1["DS_BUILD_AIO
    DS_BUILD_CUTLASS_OPS
    DS_BUILD_EVOFORMER_ATTN
    DS_BUILD_FP_QUANTIZER
    DS_BUILD_RAGGED_DEVICE_OPS
    DS_BUILD_SPARSE_ATTN"]

    J1 --> J2["Instructions:
    set DS_BUILD_AIO=0
    set DS_BUILD_CUTLASS_OPS=0
    set DS_BUILD_EVOFORMER_ATTN=0
    set DS_BUILD_FP_QUANTIZER=0
    set DS_BUILD_RAGGED_DEVICE_OPS=0
    set DS_BUILD_SPARSE_ATTN=0"]

    J2 --> K[Optional Environment Verification]
    K --> |Optional| K1[Test nvcc -V]
    K1 --> |Optional| K2[Test cl]

    K --> L[Build Process]
    K1 --> L
    K2 --> L
    L --> M[python setup.py bdist_wheel]

    M --> N{Build Successful?}
    N -->|Yes| O[Wheel file in dist folder]
    O --> P[Install with pip]
    P --> Q[Verify Installation]
    Q --> Q1["Run ds_report
    Test torch.cuda.is_available()
    Check for CUDA warnings"]

    N -->|No| R[Common Issues]
    R --> R1["Check VS x64 environment
    Verify CUDA paths
    Check admin rights
    Wait for CUDA compilation"]
    R1 --> F

    classDef prereq fill:#cce5ff,stroke:#004085,stroke-width:1px;
    classDef env fill:#fff3cd,stroke:#856404,stroke-width:1px;
    classDef vars fill:#f8d7da,stroke:#721c24,stroke-width:1px;
    classDef instructions fill:#d4edda,stroke:#155724,stroke-width:1px;
    classDef build fill:#d1ecf1,stroke:#0c5460,stroke-width:1px;
    classDef verify fill:#e2e3e5,stroke:#383d41,stroke-width:1px;
    classDef error fill:#f8d7da,stroke:#721c24,stroke-width:1px;
    classDef optional fill:#e2e3e5,stroke:#383d41,stroke-width:1px,stroke-dasharray: 5 5;

    class B,C,C1,C2,C3,D,D1,D2,E,E1 prereq;
    class F,G env;
    class I,J vars;
    class I1,J1 vars;
    class I2,J2 instructions;
    class L,M,N,O,P build;
    class Q,Q1 verify;
    class R,R1 error;
    class K,K1,K2 optional;

Support

For DeepSpeed-specific issues, refer to:

For tool-specific issues:


Application Flowchart

flowchart TD
    A[Start Application] --> B[Check Admin Rights]
    B --> C{Admin Rights?}
    C -->|No| D[Prompt for Admin Restart]
    D --> E{User Accepts?}
    E -->|Yes| F[Restart with Admin]
    E -->|No| G[Exit Application]
    C -->|Yes| H[Load Configuration JSON]
    H --> I{Config Loaded?}
    I -->|No| J[Show Error and Exit]
    I -->|Yes| K[Initialize GUI]

    K --> L[Check Prerequisites]
    L --> M[Check Visual Studio]
    M --> N{VS Found in Default Path?}
    N -->|No| O[Check Registry for VS]
    O --> P{VS Found in Registry?}
    P -->|No| Q[Show VS Install Instructions]
    P -->|Yes| R[Log VS Location]
    N -->|Yes| R

    L --> S[Check CUDA Toolkit Installation]
    S --> T{CUDA Toolkit Found?}
    T -->|No| U[Show CUDA Toolki Install Instructions]
    T -->|Yes| V[Check for NVCC]

    L --> W[Check Python Packages]
    W --> X[Check PyTorch]
    W --> Y[Check Ninja]
    W --> Z[Check psutil]

    R & V & X & Y & Z --> AA{All Prerequisites Met?}
    AA -->|No| AB[Show Missing Prerequisites]
    AA -->|Yes| AC[Enable Build/Install Buttons]

    AC --> AD[Wait for User Action]
    AD --> AE{Action Selected}
    AE -->|Build Only| AF[Start Build Process]
    AE -->|Install Only| AG[Start Install Process]
    AE -->|Build & Install| AH[Start Combined Process]

    AF & AH --> AI[Create Build Directory]
    AI --> AJ[Download DeepSpeed from GitHub]
    AJ --> AK[Extract ZIP Archive]
    AK --> AL[Move Files to Build Dir]
    AL --> AM[Create Build Script]

    AM --> AN[Set Environment Variables]
    AN --> AO[Run Build Script]
    AO --> AP{Build Successful?}
    AP -->|No| AQ[Show Build Error]
    AP -->|Yes| AR[Archive Wheel File]

    AR --> AS{Install Requested?}
    AS -->|No| AT[Show Build Success]
    AS -->|Yes| AU[Uninstall Existing DeepSpeed]
    AU --> AV[Install New Wheel]
    AV --> AW{Install Successful?}
    AW -->|No| AX[Show Install Error]
    AW -->|Yes| AY[Show Success Message]

    AY --> AZ{Show CUDA Setup?}
    AZ -->|Yes| BA[Display CUDA Setup Guide]
    AZ -->|No| BB[End Process]

    classDef startEnd fill:#f9d5e5,stroke:#333,stroke-width:2px;
    classDef process fill:#eeeeee,stroke:#333,stroke-width:1px;
    classDef decision fill:#e3f2fd,stroke:#333,stroke-width:1px;
    classDef error fill:#ffcdd2,stroke:#333,stroke-width:1px;
    classDef success fill:#c8e6c9,stroke:#333,stroke-width:1px;

    class A,G,BB startEnd;
    class B,H,K,L,M,O,R,V,W,X,Y,Z,AI,AJ,AK,AL,AM,AN,AO,AR,AU,AV process;
    class C,E,I,N,P,T,AA,AE,AP,AS,AW,AZ decision;
    class J,Q,U,AB,AQ,AX error;
    class AT,AY,BA success;