jaylong255 commented 2 days ago

[ ] Install CUDA Toolkit
[ ] Install cuDNN
[ ] Install NVIDIA Drivers
[ ] Set Up Python Environment

jaylong255 commented 2 days ago

Here's a high-level overview of preparing a laptop with an NVIDIA GeForce GPU running Ubuntu to run LLaMA models:

Install CUDA Toolkit:
- Download and install the CUDA Toolkit from NVIDIA's website. This is essential for GPU acceleration. Ensure you pick a version compatible with your GPU and Ubuntu version.
Install cuDNN:
- Obtain cuDNN, which provides primitives for deep networks. You'll need to sign up on NVIDIA's developer site to download this. Make sure it matches your CUDA version.
Install NVIDIA Drivers:
- Ensure you have the latest NVIDIA drivers. They might already be installed if you're using the proprietary drivers from the Ubuntu repositories or directly from NVIDIA.
Set Up Python Environment:
- Install Python (3.8 or higher is recommended for recent libraries).
- Use conda or venv to create a virtual environment to manage dependencies without cluttering your system.
Install Necessary Libraries:
- pip install torch torchvision torchaudio - PyTorch with CUDA support for GPU acceleration.
- pip install transformers - Hugging Face's Transformers library, which supports LLaMA and many other models.
Download the LLaMA Model:
- LLaMA models are not freely available due to licensing issues, but you can download them if you have access (often involves academic or commercial agreements).
- Use transformers CLI or manually download from a source you have access to.

Model Loading and Running:

Use the transformers library to load the model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "your-llama-model-path"  # Replace with actual path or model ID if available publicly
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.to("cuda")  # Move model to GPU

Generate text with:

inputs = tokenizer("Your input text", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Optimize for GPU Usage:
- If running into memory issues, consider using 8-bit quantization, offloading, or model parallelism techniques provided by libraries like bitsandbytes or deepspeed.
Testing and Tweaking:
- Run some tests, maybe start with smaller models or shorter sequences to ensure everything works.
- Adjust batch sizes or sequence lengths based on your GPU's memory.

Remember, running large models like LLaMA, especially the bigger variants, might require significant GPU memory. If your GeForce GPU has limited VRAM, you might need to use model parallelism or offloading strategies, or consider models that are optimized for lower memory usage.

Also, ensure your laptop's power settings are configured for performance to avoid thermal throttling during extended model runs.

jaylong255 commented 1 day ago

Enable SSH


sudo apt install -y openssh-server
sudo systemctl enable ssh
``

jaylong255 commented 1 day ago

sudo apt install -y build-essential libssl-dev dkms

jaylong255 commented 1 day ago

wget https://developer.download.nvidia.com/compute/cuda/12.6.1/local_installers/cuda_12.6.1_560.35.03_linux.run

jaylong255 commented 1 day ago

sudo sh cuda_12.6.1_560.35.03_linux.run

jaylong255 commented 1 day ago

Here's a step-by-step guide on installing the CUDA Toolkit on an Ubuntu system with an NVIDIA GeForce GPU:

Step 1: Determine Your System Specifications

Ubuntu Version: Check what version of Ubuntu you're running (lsb_release -a).
GPU Compatibility: Use nvidia-smi to check your GPU model. Ensure it's compatible with the CUDA version you plan to install.

Step 2: Download the CUDA Toolkit

Visit the NVIDIA CUDA Download page.
Choose a version compatible with your Ubuntu version and GPU. As of 2024, CUDA 12.x might be current versions, but check for the latest.
Download the .run file for your Ubuntu version (e.g., cuda_12.2.1_510.47.03_linux.run).

Step 3: Install Dependencies

Before you install CUDA, make sure your system has the necessary dependencies. Open a terminal and run:
```
sudo apt update
sudo apt install -y build-essential libssl-dev dkms
```

Step 4: Install CUDA Toolkit

Navigate to the directory where you downloaded the .run file:
```
cd /path/to/downloaded/file
```
Make the file executable:
```
chmod +x cuda*.run
```
Run the installation script. This step might take some time:
```
sudo ./cuda*.run
```
Follow the prompts. Typically, you'll just need to agree to the license agreement. The installer will handle the rest.

Step 5: Set Up Environment Variables

You need to add CUDA's bin and lib6 directories to your system's PATH and library path. Edit your .bashrc or .bash_profile:
```
nano ~/.bashrc
```

Add these lines at the end (adjust paths based on your CUDA version and installation directory):

export PATH=/usr/local/cuda-<version>/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-<version>/lib64:$LD_LIBRARY_PATH

Save and exit (Ctrl+X, then Y, Enter).

Step 6: Apply Changes

Reload your .bashrc:
```
source ~/.bashrc
```

Step 7: Verify Installation

Check if CUDA is correctly installed by running:
```
nvcc --version
```
This should print the version of the CUDA Toolkit you installed.

Step 8: Install NVIDIA Drivers (if not already installed)

If you're not already using NVIDIA drivers, install them:

sudo apt update
sudo apt install nvidia-driver-<version>  # Replace <version> with the appropriate driver version for your GPU
sudo reboot

After reboot, check if the driver is loaded with nvidia-smi.

Step 9: Test with a Simple CUDA Program

Create a simple CUDA program:

cat <<EOF > hello.cu
#include <stdio.h>

__global__ void helloFromGPU(void) {
  printf("Hello World from GPU!\n");
}

int main(void) {
  helloFromGPU<<<1, 1>>>();
  cudaDeviceSynchronize();
  return 0;
}
EOF

Compile and run:
```
nvcc hello.cu -o hello
./hello
```
You should see "Hello World from GPU!" if everything is set up correctly.

This installation will prepare your system to use NVIDIA GPUs for CUDA-accelerated computing tasks, including machine learning with PyTorch, TensorFlow, or other CUDA-enabled libraries. Remember, CUDA versions need to match with your GPU capabilities and other software like cuDNN or TensorFlow, so always check for compatibility.

jaylong255 commented 1 day ago

If you encounter this message during the CUDA Toolkit installation, it's because NVIDIA has detected an existing driver installation through your package manager (like apt on Ubuntu). Here's how to proceed:

Option 1: Remove Existing Driver and Continue with CUDA's Driver

Remove the Existing Driver:
```
sudo apt remove nvidia-*
sudo reboot
```
After reboot, resume the CUDA installation. This will let the CUDA installer handle the driver installation, ensuring compatibility with the CUDA version you're installing.

Option 2: Keep Existing Driver and Install Only the CUDA Toolkit

If you prefer to keep your current NVIDIA drivers (perhaps because they're newer or better optimized for your system), you can choose Continue during the installation.
Note: Ensure your existing driver is compatible with the CUDA version you're installing. Check NVIDIA's compatibility matrix or documentation.
Post-Installation Steps:
- After choosing Continue, the CUDA Toolkit will be installed without touching your existing drivers.

General Advice:

Compatibility: Always check if the existing driver version matches or is compatible with the CUDA Toolkit version. Mismatches can lead to issues with CUDA applications or system instability.
Driver Updates: If you later decide to update your NVIDIA drivers, ensure you update CUDA as well if necessary, or risk version mismatches.
Performance: NVIDIA's installer often provides a smoother experience with CUDA, which is why removing the existing driver and letting NVIDIA handle it is generally recommended.
Documentation: Refer to NVIDIA's official CUDA installation guide for your specific version, as steps might vary slightly with each release.

Post-Installation Verification:

Regardless of which option you choose, after installation:

Verify CUDA with nvcc --version.
Check if your GPU is recognized with nvidia-smi.
If everything seems correct but you face issues later, remember that driver and CUDA version mismatches are common culprits.

By choosing the appropriate option, you can ensure a smooth installation of the CUDA Toolkit tailored to your needs or circumstances.

jaylong255 commented 1 day ago

jay@blast-Breaker:$ sudo ./cuda_12.6.1_560.35.03_linux.run [sudo] password for jay: Installation failed. See log at /var/log/cuda-installer.log for details. jay@blast-Breaker:$ cat /var/log/cuda-installer.log INFO: Driver installation detected by command: apt list --installed | grep -e nvidia-driver-[0-9][0-9][0-9] -e nvidia-[0-9][0-9]0-9: Cleaning up window

INFO: Checking compiler version... INFO: gcc location: /usr/bin/gcc

INFO: gcc version: gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4)

INFO: Initializing menu

INFO: Setup complete INFO: Installing: Driver INFO: Installing: 560.35.03 INFO: Executing NVIDIA-Linux-x86_64-560.35.03.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1 INFO: Finished with code: 256 [ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details. [ERROR]: Install of 560.35.03 failed, quitting jay@blast-Breaker:~$

jaylong255 commented 1 day ago

From the log, it appears that the installation of the CUDA Toolkit failed specifically during the driver installation phase. Here's what you can do next:

Step 1: Check Driver Installation Logs

The log mentions another log file for more detailed information:
```
cat /var/log/nvidia-installer.log
```
- Look for specific errors or messages that might give clues about why the driver failed to install.

Step 2: Remove Existing NVIDIA Drivers

As previously suggested, the presence of existing NVIDIA drivers might be causing the issue. Remove them:
```
sudo apt remove --purge nvidia-*
sudo apt autoremove
sudo reboot
```

Step 3: Install Required Dependencies

Ensure all necessary packages are installed:

sudo apt update
sudo apt install -y build-essential dkms libssl-dev

Step 4: Rerun the CUDA Installer

Try running the CUDA installer again:
```
sudo ./cuda_12.6.1_560.35.03_linux.run
```

Step 5: If Still Fails

Check Kernel Compatibility: Ensure your kernel version is compatible with the CUDA driver. You might need to install a specific kernel version or use a driver version that supports your kernel.
- Check your kernel version:
```
uname -r
```
- NVIDIA's website has a compatibility matrix; check if your kernel version is listed.
Alternative Driver Installation: Sometimes downloading and manually installing the NVIDIA driver from NVIDIA's website before CUDA can help.
- Download the appropriate .run file for your GPU from NVIDIA's driver download page.
- Install it:
```
sudo ./NVIDIA-Linux-x86_64-<version>.run
```
- Follow prompts, then reboot.

Step 6: Verify Installation

After successfully installing (either through the CUDA installer or manually):
- Reboot your system:
```
sudo reboot
```
- Then, verify:
```
nvidia-smi
```
- This should list your GPU and driver version if successful.

Additional Tips:

Backup: Before removing drivers or kernels, backup important data.
Blacklist Nouveau: Sometimes, the Nouveau open-source driver interferes:
- Open or create /etc/modprobe.d/blacklist-nouveau.conf:
```
sudo nano /etc/modprobe.d/blacklist-nouveau.conf
```
  Add:
```
blacklist nouveau
options nouveau modeset=0
```
- Then reboot.
Check Logs Again: If you encounter issues after following these steps, re-examine the logs, especially /var/log/syslog or /var/log/dmesg for any NVIDIA or kernel-related errors post-reboot.

If these steps don't resolve the issue, consider checking NVIDIA's forums or support resources for any known issues with your specific hardware or kernel version. Sometimes, specific hardware configurations or kernel versions might have unique requirements or bugs.

jaylong255 commented 1 day ago

jay@blast-Breaker:~$ sudo ./cuda_12.6.1_560.35.03_linux.run

= Summary =

Driver: Installed Toolkit: Installed in /usr/local/cuda-12.6/

Please make sure that

PATH includes /usr/local/cuda-12.6/bin
LD_LIBRARY_PATH includes /usr/local/cuda-12.6/lib64, or, add /usr/local/cuda-12.6/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.6/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Logfile is /var/log/cuda-installer.log

Cyberworld-builders / ml-blast-breaker

ML Stack - Blast Breaker Machine #1

Enable SSH

Step 1: Determine Your System Specifications

Step 2: Download the CUDA Toolkit

Step 3: Install Dependencies

Step 4: Install CUDA Toolkit

Step 5: Set Up Environment Variables

Step 6: Apply Changes

Step 7: Verify Installation

Step 8: Install NVIDIA Drivers (if not already installed)

Step 9: Test with a Simple CUDA Program

Option 1: Remove Existing Driver and Continue with CUDA's Driver

Option 2: Keep Existing Driver and Install Only the CUDA Toolkit

General Advice:

Post-Installation Verification:

Step 1: Check Driver Installation Logs

Step 2: Remove Existing NVIDIA Drivers

Step 3: Install Required Dependencies

Step 4: Rerun the CUDA Installer

Step 5: If Still Fails

Step 6: Verify Installation

Additional Tips:

jay@blast-Breaker:~$ sudo ./cuda_12.6.1_560.35.03_linux.run

= Summary =