cocktailpeanut / dalai

The simplest way to run LLaMA on your local machine
https://cocktailpeanut.github.io/dalai
13.09k stars 1.42k forks source link

Alpaca Model - Does nothing at all.. Required Hardware? #330

Open Nocturna22 opened 1 year ago

Nocturna22 commented 1 year ago

Hello Dalai community and developers :)

Short specs: Mini Laptop:

CPU: 4x1.5GHz : intel J4105 RAM: 12GB GPU: UHD Graphics 600 GPU-Memory: 6GB (says Taskmanager, but in BIOS the Max TOLUD is 2GB... idk)

Software versions:

Windows: 11 Python: 3.10.10 NodeJS: 18.15.0 npm: 9.6.2 Alpaca: 7B

Story:

I have installed Dalai Alpaca on my main computer. Everything has worked from the beginning. So I wanted to try it on my mini laptop as well. So I installed node.js, python and VisualStudio with the C libraries. Then I installed Dalai Aplaca 7B. The webserver starts, but when I try to generate an output nothing happens and the code just exits.

Expected behavier:

launch main.exe that consumes up to 4GB of RAM High CPU Load Text output

Acual behavier:

I click "Go", i get an output with my input and then the code exits. I get no error message. No RAM consumtion, No high CPU Load.

I started dalai with "npx -dd dalai serve" to get more info. Here what the CMD window says:

C:\Users\Humanity\Desktop>npx -dd dalai serve
npm verb cli C:\Program Files\nodejs\node.exe C:\Users\Humanity\AppData\Roaming\npm\node_modules\npm\bin\npm-cli.js
npm info using npm@9.6.2
npm info using node@v18.15.0
npm verb title npm exec dalai serve
npm verb argv "exec" "--loglevel" "verbose" "--" "dalai" "serve"
npm verb logfile logs-max:10 dir:C:\Users\Humanity\AppData\Local\npm-cache\_logs\2023-03-30T14_17_28_515Z-
npm verb logfile C:\Users\Humanity\AppData\Local\npm-cache\_logs\2023-03-30T14_17_28_515Z-debug-0.log
npm http fetch GET 200 https://registry.npmjs.org/dalai 251ms (cache revalidated)
mkdir C:\Users\Humanity\dalai
Server running on http://localhost:3000/
 query: {
  seed: -1,
  threads: '4',
  n_predict: 200,
  top_k: 40,
  top_p: 0.9,
  temp: 0.8,
  repeat_last_n: 64,
  repeat_penalty: 1.3,
  debug: true,
  models: [ 'alpaca.7B' ],
  model: 'alpaca.7B',
  prompt: 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n' +
    '\n' +
    '### Instruction:\n' +
    'Tell me: is this just a dream?\n' +
    '\n' +
    '### Response:\n',
  id: 'TS-1680185893802-23987'
}
{ Core: 'alpaca', Model: '7B' }
exec: C:\Users\Humanity\dalai\alpaca\build\Release\main --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nTell me: is this just a dream?\n\n### Response:\n" in C:\Users\Humanity\dalai\alpaca

This is the Browseroutput in Firefox with Debug enabled:

Windows PowerShell

Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.Installieren Sie die neueste PowerShell für neue Funktionen und Verbesserungen! https://aka.ms/PSWindows:WindowsSystem32WindowsPowerShellv1.0powershell.exePS C:UsersHumanitydalaialpaca> > [> [System.Console][System.Console]::OutputEncodin> [System.Console]::OutputEncoding=[System.Cons[System.Console]::OutputEncoding=[System.Console]::InputEncoding=[System.Console]::OutputEncoding=[System.Console]::InputEncoding=[System.Text.Encoding]::UTF8; C:UsersHumanitydalaialpacauildReleasemain --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an i> [System.Console]::OutputEncoding=[System.Console]::InputEncoding=[System.Text.Encoding]::UTF8; C:UsersHumanitydalaialpacauildReleasemain --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Tell me: is this just a dream?

### Response:
[System.Console]::OutputEncoding=[System.Console]::InputEncoding=[System.Text.Encoding]::UTF8; C:UsersHumanitydalaialpacauildReleasemain --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Tell me: is this just a dream?

### Response:
"

PS C:UsersHumanitydalaialpaca> exit

Debug log

0 verbose cli C:\Program Files\nodejs\node.exe C:\Users\Humanity\AppData\Roaming\npm\node_modules\npm\bin\npm-cli.js
1 info using npm@9.6.2
2 info using node@v18.15.0
3 timing npm:load:whichnode Completed in 3ms
4 timing config:load:defaults Completed in 4ms
5 timing config:load:file:C:\Users\Humanity\AppData\Roaming\npm\node_modules\npm\npmrc Completed in 4ms
6 timing config:load:builtin Completed in 4ms
7 timing config:load:cli Completed in 5ms
8 timing config:load:env Completed in 1ms
9 timing config:load:project Completed in 3ms
10 timing config:load:file:C:\Users\Humanity\.npmrc Completed in 1ms
11 timing config:load:user Completed in 1ms
12 timing config:load:file:C:\Users\Humanity\AppData\Roaming\npm\etc\npmrc Completed in 1ms
13 timing config:load:global Completed in 1ms
14 timing config:load:setEnvs Completed in 2ms
15 timing config:load Completed in 21ms
16 timing npm:load:configload Completed in 22ms
17 timing npm:load:mkdirpcache Completed in 1ms
18 timing npm:load:mkdirplogs Completed in 1ms
19 verbose title npm exec dalai serve
20 verbose argv "exec" "--loglevel" "verbose" "--" "dalai" "serve"
21 timing npm:load:setTitle Completed in 4ms
22 timing config:load:flatten Completed in 7ms
23 timing npm:load:display Completed in 24ms
24 verbose logfile logs-max:10 dir:C:\Users\Humanity\AppData\Local\npm-cache\_logs\2023-03-30T14_27_25_558Z-
25 verbose logfile C:\Users\Humanity\AppData\Local\npm-cache\_logs\2023-03-30T14_27_25_558Z-debug-0.log
26 timing npm:load:logFile Completed in 17ms
27 timing npm:load:timers Completed in 0ms
28 timing npm:load:configScope Completed in 0ms
29 timing npm:load Completed in 74ms
30 silly logfile start cleaning logs, removing 2 files
31 timing arborist:ctor Completed in 2ms
32 silly logfile done cleaning log files
33 timing arborist:ctor Completed in 0ms
34 http fetch GET 200 https://registry.npmjs.org/dalai 935ms (cache revalidated)
35 timing arborist:ctor Completed in 0ms
36 timing arborist:ctor Completed in 0ms
37 timing arborist:ctor Completed in 0ms

Is my Hardware too weak? Or any hints? I don't get any error messages, so I don't know what to do either.... Can someone please tell me how I can further Troubleshoot?

(:

VanHallein commented 1 year ago

Dalai is currently having issues with installing the llama model, as there are issues with the PowerShell script. I think it is related to #241. Try downloading alpaca.7B as an alternative, it should at least work and give you some output.

Nocturna22 commented 1 year ago

Dalai is currently having issues with installing the llama model, as there are issues with the PowerShell script. I think it is related to #241. Try downloading alpaca.7B as an alternative, it should at least work and give you some output.

Thanks for your answer, but I'm trying with Alpaca and not LLama :C Maybe it doesn't work with Alpaca now either?

It seems to be something different. I tried to follow these steps: https://github.com/cocktailpeanut/dalai/issues/245#issuecomment-1481806448 But main.exe already exists in the folder.

pdavis68 commented 1 year ago

Have you tried running main.exe from the command-line with the parameters supplied?

If it responds right away without a response to your prompt, try: echo %errorlevel%

Nocturna22 commented 1 year ago

Have you tried running main.exe from the command-line with the parameters supplied?

If it responds right away without a response to your prompt, try: echo %errorlevel%

Thanks for the hint. But unfortunately I do not know which parameters I can pass. I don't know how to google either. If I start main.exe without parameters and then output the errorlevel, I get the following integer back:

-1073741795

pdavis68 commented 1 year ago

Segmentation fault. I had the same issue. Don't have a fix.

glozachmeur commented 1 year ago

Same, I have this kind of error with Llama and Alpaca 30B :

llama_model_load: ggml ctx size = 21450.50 MB Segmentation fault exit

I use docker and this error is related to a lake of RAM I guess... I have about 22go RAM free before running the model, sometimes less, so maybe 32go of ram isn't enough to run 30B models with docker ? The vmmem process takes about 8-10go of ram before running model :thinking:

Nocturna22 commented 1 year ago

Same, I have this kind of error with Llama and Alpaca 30B :

llama_model_load: ggml ctx size = 21450.50 MB Segmentation fault exit

I use docker and this error is related to a lake of RAM I guess... I have about 22go RAM free before running the model, sometimes less, so maybe 32go of ram isn't enough to run 30B models with docker ? The vmmem process takes about 8-10go of ram before running model 🤔

Hmmmm on this site are the requirements:

2. Memory Requirements

Runs on most modern computers. Unless your computer is very very old, it should work.

According to https://github.com/ggerganov/llama.cpp/issues/13, here are the memory requirements:

    7B => ~4 GB
    13B => ~8 GB
    30B => ~16 GB
    65B => ~32 GB

EDIT: Maby this helps someone (i hope the link works for you..):

https://www.phind.com/search?cache=6a6b46d4-d929-49b7-bbdd-2d3cca237077

pdavis68 commented 1 year ago

I would get it regardless of model (tried 7B and 13B) and I have 32GB on my machine. And it worked fine running under a Linux VM with only 16GB. So the problem appears to be specific to the Windows version.

maifeeulasad commented 1 year ago

@pdavis68 , virtualization 🤣 ☕

It's running flawlessly on my Linux environment, but I've been having trouble getting it to work on my Windows 11 PC with 6 GB of V-RAM and 16 GB of RAM.

Problem description:

jesko42 commented 1 year ago

The crashes appear because of programming errors: They don't check whether they get memory for the model or not ;-)

In ggml.c Line 2450 it's only checked against alignment, but not whether they have the memory at all ... ggml_assert_aligned(ctx->mem_buffer);

Add in line 2450: assert(ctx->mem_buffer); Or print out an error: if (NULL==ctx->mem_buffer){printf("Not enough memory to store model\n");exit(-1);}


The description on the main-page in GitHub is wrong. The model does not have to reside where the .exe is, but where you start it from. So if you build with CMAKE within the build directory the model has to reside in the "build" directory itself, even if the executable is in "Release/main". The .exe is NOT name "chat" but "main" ... etcpp.

So (Windows, have CMAKE installed and VisualStudio2019 for example):

Nocturna22 commented 1 year ago

In linux I have everything running on the same machine. I think I will stay with linux... But if I ever switch to windows I will try that :) Thank you.

(And yes... with 4 cores it is really pathetic :P I hope there will be a better quantized model sometime. Then I will test it again with the maschiene. But so 4 cores with the clock speed are just too weak. (But with 24 cores it goes really fast xD))

I dont know if i can close this issue now because i cant test it. Those who have the same problem: Please report.

toolchild commented 1 year ago

I am running on Windows 10. When I had segmentation faults it was because I was trying to use a model that was too big (30B).

I can run the 13B one with a 32GB Ram and 16GB video card memory. It never seems to use the videocard memory though. It uses up to about 24GB of the Ram when it's working... and about 16GB when idling for the whole system. And about 4 GB when idling for just the vmmem and about 13 or 14gb when working. So I think a 16GB Ram Windows PC will not work with the 13B Model, but should with the 7B Model.

If you get Segmentation Faults with the 7B model, it likely is the hardware is what different AIs told me.

pdavis68 commented 1 year ago

I get the segmentation fault under Windows with the 7B model. Using the same machine, running it in a linux VM with only 16GB of the 32GB allocated to it, it runs fine. So it's not a hardware issue for me.

t0fum4n commented 1 year ago

In linux I have everything running on the same machine. I think I will stay with linux... But if I ever switch to windows I will try that :) Thank you.

(And yes... with 4 cores it is really pathetic :P I hope there will be a better quantized model sometime. Then I will test it again with the maschiene. But so 4 cores with the clock speed are just too weak. (But with 24 cores it goes really fast xD))

I dont know if i can close this issue now because i cant test it. Those who have the same problem: Please report.

So you were able to get it running in Linux? I've tried Win 10 but I'm spinning up an Ubuntu 20.04 to try.

Atharva2628 commented 1 year ago

Guys! I finally got it to work!!!

Its a long procedure but bear with me.

OS: Windows 10

Option 1:

Step 0 - Rules and prerequisites:

Make sure you have enough storage space to download Visual Studio, the models and any other dependencies. Have at least 25 GB. I recommend 50 GB. Make sure you have enough RAM to run the model. Have at least 8 GB total. I recommend 12 GB for the 7B model. Make sure your CPU has enough threads. Have at least 4 threads. I recommend 12. Make sure your internet connection is good during the process. Otherwise, the system may throw a tantrum like a 6 year old. Make sure the computer doesn't sleep, set the sleep time to 'Never'. Otherwise, the system may throw a tantrum like a 6 year old. Make sure to have enough patience to not interrupt the download/installs. Otherwise, the system may throw a tantrum like a 6 year old. Make sure the system doesn't throw a tantrum like a 6 year old. Anyway, that being said, below are the steps.

Step 1 - Fresh start:

Remove all files, folders and any other references to 'Dalai', 'Llama', 'Alpaca', etc. Delete all of it. Make sure when you search any of those terms, nothing shows up. Also type 'dalai' and 'npx dalai' in command prompt and make sure this command is NOT recognized. Also remove/uninstall Visual Studio if you have it installed.

Step 2 - Check Versions:

Check your Python and Node versions. Type in command prompt: python -V node --version Make sure Python is below 3.10 (<3.10) and Node is above 18 (>18) If that's not the case, uninstall them and install versions that meet these requirements. Python Download: https://www.python.org/downloads/release/python-398/ Node Download: https://nodejs.org/en/download

Step 3 - Install Visual Studio:

This is a requirement. Install Visual Studio (Latest version) VS Download: https://visualstudio.microsoft.com/downloads/ IMPORTANT While downloading, select these options (Python, Node, C++):

vs

Proceed to download, sign in to your Microsoft account on VS and make sure all your Visual Studio tools are up to date.

Step 4 - Install node-pty:

This is what fixed it for me. Open Command Prompt as Administrator. Type in: npm install node-pty

Step 5 - Installing the model:

Open Command Prompt as Administrator. (Make sure you are in CMD and not powershell) Type in: npx dalai alpaca install 7B npx dalai llama install 7B (or whatever other model you need, like 13B, 30B, etc.) DO NOT interrupt the install in any way. Otherwise the system may throw tantrums.

Step 6 - Start server:

Type in command prompt: npx dalai serve It should begin at http://localhost:3000/

Step 7 - Using the WebUI:

DO NOT open the link (http://localhost:3000/) in your normal browser. ONLY USE INCOGNITO This is because the program seems to have problems when your browser has an active history or something like that (still not entirely sure why)

Step 8 - Generating a prompt:

When you open the WebUI, select alpaca.7B from the model list (even though its already selected, click and re-select it) To test if the prompts are working, don't change any settings and type in a small and easy prompt such as: today's date is (make sure to delete the template prompt before typing this in) Depending on your processing power, you should get an output within 2 minutes at most (Ideally within 30 seconds) If this works, you can now tweak settings and prompts to your liking :)

Option 2:

If option 1 doesn't work, just use a linux computer. The process is MUCH simpler (just 2 lines of code) and doesn't throw any weird errors. I was able to get it running on my low powered Linux laptop within minutes.

Option 3:

If none of the previous options work, pray to the UwU Gods for a miracle and try again.

Nocturna22 commented 1 year ago

So you were able to get it running in Linux? I've tried Win 10 but I'm spinning up an Ubuntu 20.04 to try.

Yes i was able to run in on Linux (debian/ubuntu based Distro: ZorinOS). I've made a dualboot and i think i will test it in windows right now with the 2 solutions presented here. Then i will report :)

pdavis68 commented 1 year ago

Why dual boot? Wouldn't it be easier to use a VM under HyperV? Unless memory is the issue.

Nocturna22 commented 1 year ago

@Atharva2628 This did not work for me. Same error.

@jesko42 Your solution also does not work for me.

@pdavis68 The laptop only has 4 weak cores. I wanted to stack as little load as possible. CPU is really fast at 100% in Windows.

I dont know why i forget about docker all the time. I used Docker (wsl) to get it running on that machine. But its reeeeeeeally slow. In Linux its much faster.

t0fum4n commented 1 year ago

@Atharva2628 This did not work for me. Same error.

@jesko42 Your solution also does not work for me.

@pdavis68 The laptop only has 4 weak cores. I wanted to stack as little load as possible. CPU is really fast at 100% in Windows.

I dont know why i forget about docker all the time. I used Docker (wsl) to get it running on that machine. But its reeeeeeeally slow. In Linux its much faster.

I'm going to be installing it on Ubuntu 20.04 later this evening. It's going to be a VM on a Win10 host. I will report back on that install.

jesko42 commented 1 year ago

Hey all: to be NOT misunderstood: I told you this "The crashes appear because of programming errors" My "solution" is only to SHOW if you have too less memory! By either "assert" the error in Debug-modes or write out a line of text and exit in this case !!!

I cannot help if you have too less memory. To me questionable in the end is why there is NO USE of virtual memory ... I could investigate here, but not yet as I'm currently very busy ;-) sorry

Nocturna22 commented 1 year ago

Hey all: to be NOT misunderstood: I told you this "The crashes appear because of programming errors" My "solution" is only to SHOW if you have too less memory! By either "assert" the error in Debug-modes or write out a line of text and exit in this case !!!

I cannot help if you have too less memory. To me questionable in the end is why there is NO USE of virtual memory ... I could investigate here, but not yet as I'm currently very busy ;-) sorry

All good, i know that. But after compiling i got some errors and i couldnt even start dalai. (Probably i made a mistake, but i dont really think its related to that, because the docker image works.. sooo it has to be something else. And others say that they have enough RAM in any case.)