Cyberworld-builders / ml-blast-breaker

Launching an ML stack on an old gaming machine.
0 stars 0 forks source link

ML Stack - Blast Breaker Machine #1

Open jaylong255 opened 2 days ago

jaylong255 commented 2 days ago
  1. [ ] Install CUDA Toolkit
  2. [ ] Install cuDNN
  3. [ ] Install NVIDIA Drivers
  4. [ ] Set Up Python Environment
jaylong255 commented 2 days ago

Here's a high-level overview of preparing a laptop with an NVIDIA GeForce GPU running Ubuntu to run LLaMA models:

  1. Install CUDA Toolkit:

    • Download and install the CUDA Toolkit from NVIDIA's website. This is essential for GPU acceleration. Ensure you pick a version compatible with your GPU and Ubuntu version.
  2. Install cuDNN:

    • Obtain cuDNN, which provides primitives for deep networks. You'll need to sign up on NVIDIA's developer site to download this. Make sure it matches your CUDA version.
  3. Install NVIDIA Drivers:

    • Ensure you have the latest NVIDIA drivers. They might already be installed if you're using the proprietary drivers from the Ubuntu repositories or directly from NVIDIA.
  4. Set Up Python Environment:

    • Install Python (3.8 or higher is recommended for recent libraries).
    • Use conda or venv to create a virtual environment to manage dependencies without cluttering your system.
  5. Install Necessary Libraries:

    • pip install torch torchvision torchaudio - PyTorch with CUDA support for GPU acceleration.
    • pip install transformers - Hugging Face's Transformers library, which supports LLaMA and many other models.
  6. Download the LLaMA Model:

    • LLaMA models are not freely available due to licensing issues, but you can download them if you have access (often involves academic or commercial agreements).
    • Use transformers CLI or manually download from a source you have access to.
  7. Model Loading and Running:

    • Use the transformers library to load the model:

      from transformers import AutoModelForCausalLM, AutoTokenizer
      
      model_name = "your-llama-model-path"  # Replace with actual path or model ID if available publicly
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      model = AutoModelForCausalLM.from_pretrained(model_name)
      model.to("cuda")  # Move model to GPU
    • Generate text with:
      inputs = tokenizer("Your input text", return_tensors="pt").to("cuda")
      outputs = model.generate(**inputs)
      print(tokenizer.decode(outputs[0], skip_special_tokens=True))
  8. Optimize for GPU Usage:

    • If running into memory issues, consider using 8-bit quantization, offloading, or model parallelism techniques provided by libraries like bitsandbytes or deepspeed.
  9. Testing and Tweaking:

    • Run some tests, maybe start with smaller models or shorter sequences to ensure everything works.
    • Adjust batch sizes or sequence lengths based on your GPU's memory.

Remember, running large models like LLaMA, especially the bigger variants, might require significant GPU memory. If your GeForce GPU has limited VRAM, you might need to use model parallelism or offloading strategies, or consider models that are optimized for lower memory usage.

Also, ensure your laptop's power settings are configured for performance to avoid thermal throttling during extended model runs.

jaylong255 commented 1 day ago

Enable SSH


sudo apt install -y openssh-server
sudo systemctl enable ssh
``
jaylong255 commented 1 day ago

sudo apt install -y build-essential libssl-dev dkms

jaylong255 commented 1 day ago

wget https://developer.download.nvidia.com/compute/cuda/12.6.1/local_installers/cuda_12.6.1_560.35.03_linux.run

jaylong255 commented 1 day ago

sudo sh cuda_12.6.1_560.35.03_linux.run

jaylong255 commented 1 day ago

Here's a step-by-step guide on installing the CUDA Toolkit on an Ubuntu system with an NVIDIA GeForce GPU:

Step 1: Determine Your System Specifications

Step 2: Download the CUDA Toolkit

Step 3: Install Dependencies

Step 4: Install CUDA Toolkit

Step 5: Set Up Environment Variables

Step 6: Apply Changes

Step 7: Verify Installation

Step 8: Install NVIDIA Drivers (if not already installed)

Step 9: Test with a Simple CUDA Program

This installation will prepare your system to use NVIDIA GPUs for CUDA-accelerated computing tasks, including machine learning with PyTorch, TensorFlow, or other CUDA-enabled libraries. Remember, CUDA versions need to match with your GPU capabilities and other software like cuDNN or TensorFlow, so always check for compatibility.

jaylong255 commented 1 day ago

image

jaylong255 commented 1 day ago

If you encounter this message during the CUDA Toolkit installation, it's because NVIDIA has detected an existing driver installation through your package manager (like apt on Ubuntu). Here's how to proceed:

Option 1: Remove Existing Driver and Continue with CUDA's Driver

Option 2: Keep Existing Driver and Install Only the CUDA Toolkit

General Advice:

Post-Installation Verification:

Regardless of which option you choose, after installation:

By choosing the appropriate option, you can ensure a smooth installation of the CUDA Toolkit tailored to your needs or circumstances.

jaylong255 commented 1 day ago

jay@blast-Breaker:$ sudo ./cuda_12.6.1_560.35.03_linux.run [sudo] password for jay: Installation failed. See log at /var/log/cuda-installer.log for details. jay@blast-Breaker:$ cat /var/log/cuda-installer.log INFO: Driver installation detected by command: apt list --installed | grep -e nvidia-driver-[0-9][0-9][0-9] -e nvidia-[0-9][0-9]0-9: Cleaning up window

INFO: Checking compiler version... INFO: gcc location: /usr/bin/gcc

INFO: gcc version: gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4)

INFO: Initializing menu

INFO: Setup complete INFO: Installing: Driver INFO: Installing: 560.35.03 INFO: Executing NVIDIA-Linux-x86_64-560.35.03.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1 INFO: Finished with code: 256 [ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details. [ERROR]: Install of 560.35.03 failed, quitting jay@blast-Breaker:~$

jaylong255 commented 1 day ago

From the log, it appears that the installation of the CUDA Toolkit failed specifically during the driver installation phase. Here's what you can do next:

Step 1: Check Driver Installation Logs

Step 2: Remove Existing NVIDIA Drivers

Step 3: Install Required Dependencies

Step 4: Rerun the CUDA Installer

Step 5: If Still Fails

Step 6: Verify Installation

Additional Tips:

If these steps don't resolve the issue, consider checking NVIDIA's forums or support resources for any known issues with your specific hardware or kernel version. Sometimes, specific hardware configurations or kernel versions might have unique requirements or bugs.

jaylong255 commented 1 day ago

jay@blast-Breaker:~$ sudo ./cuda_12.6.1_560.35.03_linux.run

= Summary =

Driver: Installed Toolkit: Installed in /usr/local/cuda-12.6/

Please make sure that

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.6/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Logfile is /var/log/cuda-installer.log