SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
https://scisharp.github.io/LLamaSharp
MIT License
2.65k stars 345 forks source link

How do you run the example with GPU? #18

Closed trrahul closed 1 year ago

trrahul commented 1 year ago

Setting n_gpu_layers has no effect? How do you run the example with GPU?

AsakusaRinne commented 1 year ago

Please install LLamaSharp.Backend.Cuda11 or LLamaSharp.Backend.Cuda12 at first, then set n_gpu_layers to a positive number.

Generally, if nothing goes wrong, there will be an output like this "offload 20 layers to gpu"

trrahul commented 1 year ago

I installed the cuda package on to the Llama.Examples project and set the n_gpu_layers to 100. Still it is not using GPU.

AsakusaRinne commented 1 year ago

Could you please provide the outputs when you run the program? (the output when loading the model) Besides, it'e better if your system info is available here.

trrahul commented 1 year ago

Sure.

0: Run a chat session.
1: Run a LLamaModel to chat.
2: Quantize a model.
3: Get the embeddings of a message.
4: Run a LLamaModel with instruct mode.
5: Load and save state of LLamaModel.

Your choice: 0
Please input your model path: C:\Users\rahul\Downloads\wizardLM-7B.ggmlv3.q4_0.bin
llama.cpp: loading model from C:\Users\rahul\Downloads\wizardLM-7B.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 5407.72 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =  256.00 MB
---------------
Display Devices
---------------
           Card name: NVIDIA GeForce GTX 1650
        Manufacturer: NVIDIA
           Chip type: GeForce GTX 1650
            DAC type: Integrated RAMDAC
         Device Type: Full Device (POST)
          Device Key: Enum\PCI\VEN_10DE&DEV_1F91&SUBSYS_3FFB17AA&REV_A1
       Device Status: 0180200A [DN_DRIVER_LOADED|DN_STARTED|DN_DISABLEABLE|DN_NT_ENUMERATOR|DN_NT_DRIVER] 
 Device Problem Code: No Problem
 Driver Problem Code: Unknown
      Display Memory: 12109 MB
    Dedicated Memory: 3962 MB
       Shared Memory: 8147 MB
        Current Mode: 1920 x 1080 (32 bit) (60Hz)
         HDR Support: Not Supported
    Display Topology: Internal
 Display Color Space: DXGI_COLOR_SPACE_RGB_FULL_G22_NONE_P709
     Color Primaries: Red(0.589844,0.370117), Green(0.349609,0.554688), Blue(0.155273,0.110352), White Point(0.313477,0.329102)
   Display Luminance: Min Luminance = 0.500000, Max Luminance = 270.000000, MaxFullFrameLuminance = 270.000000
        Monitor Name: Generic PnP Monitor
       Monitor Model: unknown
          Monitor Id: LGD05E5
         Native Mode: 1920 x 1080(p) (59.977Hz)
         Output Type: Displayport Embedded
Monitor Capabilities: HDR Not Supported
Display Pixel Format: DISPLAYCONFIG_PIXELFORMAT_32BPP
      Advanced Color: Not Supported
         Driver Name: C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_5adc6075318430cf\nvldumdx.dll,C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_5adc6075318430cf\nvldumdx.dll,C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_5adc6075318430cf\nvldumdx.dll,C:\WINDOWS\System32\DriverStore\FileRepository\nvlt.inf_amd64_5adc6075318430cf\nvldumdx.dll
 Driver File Version: 27.21.0014.6230 (English)
      Driver Version: 27.21.14.6230
         DDI Version: 12
      Feature Levels: 12_1,12_0,11_1,11_0,10_1,10_0,9_3,9_2,9_1
        Driver Model: WDDM 2.7
 Hardware Scheduling: Supported:True Enabled:False 
 Graphics Preemption: Pixel
  Compute Preemption: Dispatch
            Miracast: Not Supported
      Detachable GPU: No
 Hybrid Graphics GPU: Not Supported
      Power P-states: Not Supported
      Virtualization: Paravirtualization 
          Block List: No Blocks
  Catalog Attributes: Universal:False Declarative:True 
   Driver Attributes: Final Retail
    Driver Date/Size: 05-04-2021 05:30:00 AM, 1049064 bytes
         WHQL Logo'd: n/a
     WHQL Date Stamp: n/a
   Device Identifier: {D7B71E3E-5CD1-11CF-4C6C-F51F1BC2D635}
           Vendor ID: 0x10DE
           Device ID: 0x1F91
           SubSys ID: 0x3FFB17AA
         Revision ID: 0x00A1
  Driver Strong Name: oem7.inf:0f066de306961689:Section171:27.21.14.6230:pci\ven_10de&dev_1f91&subsys_3ffb17aa
      Rank Of Driver: 00CF0001
         Video Accel: 
         DXVA2 Modes: {86695F12-340E-4F04-9FD3-9253DD327460}  DXVA2_ModeMPEG2_VLD  {6F3EC719-3735-42CC-8063-65CC3CB36616}  DXVA2_ModeVC1_D2010  DXVA2_ModeVC1_VLD  {32FCFE3F-DE46-4A49-861B-AC71110649D5}  DXVA2_ModeH264_VLD_Stereo_Progressive_NoFGT  DXVA2_ModeH264_VLD_Stereo_NoFGT  DXVA2_ModeH264_VLD_NoFGT  DXVA2_ModeHEVC_VLD_Main  DXVA2_ModeHEVC_VLD_Main10  {20BB8B0A-97AA-4571-8E99-64E60606C1A6}  {15DF9B21-06C4-47F1-841E-A67C97D7F312}  DXVA2_ModeMPEG4pt2_VLD_Simple  DXVA2_ModeMPEG4pt2_VLD_AdvSimple_NoGMC  {9947EC6F-689B-11DC-A320-0019DBBC4184}  {33FCFE41-DE46-4A49-861B-AC71110649D5}  DXVA2_ModeVP9_VLD_Profile0  DXVA2_ModeVP9_VLD_10bit_Profile2  {DDA19DC7-93B5-49F5-A9B3-2BDA28A2CE6E}  {6AFFD11E-1D96-42B1-A215-93A31F09A53D}  {914C84A3-4078-4FA9-984C-E2F262CB5C9C}  
   Deinterlace Caps: {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(YUY2,YUY2) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY DeinterlaceTech_PixelAdaptive 
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(YUY2,YUY2) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY 
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(YUY2,YUY2) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY 
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(YUY2,YUY2) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY DeinterlaceTech_BOBVerticalStretch 
                     {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(UYVY,UYVY) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY DeinterlaceTech_PixelAdaptive 
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(UYVY,UYVY) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY 
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(UYVY,UYVY) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY 
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(UYVY,UYVY) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY DeinterlaceTech_BOBVerticalStretch 
                     {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(YV12,0x32315659) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY DeinterlaceTech_PixelAdaptive 
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(YV12,0x32315659) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY 
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(YV12,0x32315659) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY 
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(YV12,0x32315659) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY DeinterlaceTech_BOBVerticalStretch 
                     {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(NV12,0x3231564e) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY DeinterlaceTech_PixelAdaptive 
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(NV12,0x3231564e) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY 
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(NV12,0x3231564e) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY 
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(NV12,0x3231564e) Frames(Prev/Fwd/Back)=(0,0,0) Caps=VideoProcess_YUV2RGB VideoProcess_StretchX VideoProcess_StretchY DeinterlaceTech_BOBVerticalStretch 
                     {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(IMC1,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(IMC1,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(IMC1,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(IMC1,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(IMC2,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(IMC2,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(IMC2,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(IMC2,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(IMC3,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(IMC3,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(IMC3,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(IMC3,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(IMC4,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(IMC4,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(IMC4,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(IMC4,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(S340,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(S340,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(S340,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(S340,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {6CB69578-7617-4637-91E5-1C02DB810285}: Format(In/Out)=(S342,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {F9F19DA5-3B09-4B2F-9D89-C64753E3EAAB}: Format(In/Out)=(S342,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {5A54A0C9-C7EC-4BD9-8EDE-F3C75DC4393B}: Format(In/Out)=(S342,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
                     {335AA36E-7884-43A4-9C91-7F87FAF3E37E}: Format(In/Out)=(S342,UNKNOWN) Frames(Prev/Fwd/Back)=(0,0,0) Caps=
        D3D9 Overlay: Supported
             DXVA-HD: Supported
        DDraw Status: Enabled
          D3D Status: Enabled
          AGP Status: Enabled
       MPO MaxPlanes: 4
            MPO Caps: RGB,YUV,BILINEAR,HIGH_FILTER,STRETCH_YUV,STRETCH_RGB,IMMEDIATE,HDR (MPO3)
         MPO Stretch: 10.000X - 0.500X
     MPO Media Hints: resizing, colorspace Conversion 
         MPO Formats: NV12,R16G16B16A16_FLOAT,R10G10B10A2_UNORM,R8G8B8A8_UNORM,B8G8R8A8_UNORM
    PanelFitter Caps: RGB,YUV,BILINEAR,HIGH_FILTER,STRETCH_YUV,STRETCH_RGB,IMMEDIATE,HDR (MPO3)
 PanelFitter Stretch: 10.000X - 0.500X

image

image

AsakusaRinne commented 1 year ago

What's your cuda version? Since you installed two packages at the same time, one of them will cover the other. Therefore the cuda version of your system and that of LLamaSharp.Backend may be mismatched.

trrahul commented 1 year ago

I have cuda 12 installed. Removed Cuda11 package and tested again. No luck.

trrahul commented 1 year ago

My GPU's Cuda Compute Capability is 7.5 and I think it will only work with Cuda version 10. (https://stackoverflow.com/a/28933055)

AsakusaRinne commented 1 year ago

Would you like to build from source (llama.cpp) to have a try? I could help you with that. I think if it could be built from source, then it should work in your computer.

trrahul commented 1 year ago

Thank you. It was running on my CPU fine and I wanted to know how it switches to GPU and works. I saw the cuda dlls in the runtime folder but the code was

https://github.com/SciSharp/LLamaSharp/blob/e603a0913735660b3b4310b3c88178f6421dcd48/LLama/Native/NativeApi.cs#L29

Shouldn't it be private const string libraryName = "libllama-cuda11"; to load the GPU version?

I changed it so and ran the example and that is when I got an interop exception which led me to believe there might be something wrong with my cuda version. I thought of compiling it myself but haven't got enough time to try it yet.

AsakusaRinne commented 1 year ago

In LLamaSharp.Backend package they are all renamed to libllama.dll. When using master branch you need to change the filename to choose which dll you want to use. What does the interop exception look like?

martindevans commented 1 year ago

I'll close this issue since it's been inactive for so long, but feel free to comment here if it's still an issue and I'll reopen it.