It requires 40G of RAM to work?

sujantkumarkv commented 1 year ago

I'm on a macbook air m2 16GB.

I tried running it with python locally in this line from README: To run it natively (not using Docker) in a Python venv, you can use these commands:

A few times it failed with missing packages like greenlet etc but the error now it gives is regarding the RAM allocation.

as visible, it says: raise ValueError(f"Cannot allocate {RAMDISK_SIZE_IN_GB}G for RAM Disk. Total system RAM is {total_ram_gb:.2f}G.") ValueError: Cannot allocate 40G for RAM Disk. Total system RAM is 16.00G.

Dicklesworthstone commented 1 year ago

You can adjust all the settings by editing the .env file: https://github.com/Dicklesworthstone/llama_embeddings_fastapi_service/blob/main/.env

You have to lower the amount of RAM used, or since you have so little RAM in your machine, you can just turn off the RAM disk by setting

USE_RAMDISK=False

in the .env file.

On Sun, Aug 20, 2023 at 2:37 PM Sujant Kumar Kv @.***> wrote:

I'm on a macbook air m2 16GB.

I tried running it with python locally in this line from README: To run it natively (not using Docker) in a Python venv, you can use these commands:

A few times it failed with missing packages like greenlet etc but the error now it gives is regarding the RAM allocation. [image: image] https://user-images.githubusercontent.com/73742938/261870343-8311f311-bccc-4efd-ac06-460c0a000cdd.png

as visible, it says: raise ValueError(f"Cannot allocate {RAMDISK_SIZE_IN_GB}G for RAM Disk. Total system RAM is {total_ram_gb:.2f}G.") ValueError: Cannot allocate 40G for RAM Disk. Total system RAM is 16.00G.

— Reply to this email directly, view it on GitHub https://github.com/Dicklesworthstone/llama_embeddings_fastapi_service/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILNF3RY64R6WGERYU5AF6DXWJKNHANCNFSM6AAAAAA3XOIVGU . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>

Dicklesworthstone commented 1 year ago

And no, it doesn't need 40gb of RAM. It just needs enough to use whatever model you have selected if you turn off the RAM disk. You can focus on smaller models that easily fit in your RAM budget.

dlaliberte commented 8 months ago

Thanks for your work and contribution.

Does it work with a GPU? You should document that either way.

Dicklesworthstone commented 8 months ago

It only uses that amount of RAM if you enable the optional ram disk, which is disabled by default. It uses llama-cpp-python for all the LLM stuff, so you enable GPU the same way you do for that library: Installation with Specific Hardware Acceleration (BLAS, CUDA, Metal, etc)

The default pip install behaviour is to build llama.cpp for CPU only on Linux and Windows and use Metal on MacOS.

llama.cpp supports a number of hardware acceleration backends depending including OpenBLAS, cuBLAS, CLBlast, HIPBLAS, and Metal. See the llama.cpp README https://github.com/ggerganov/llama.cpp#build for a full list of supported backends.

All of these backends are supported by llama-cpp-python and can be enabled by setting the CMAKE_ARGS environment variable before installing.

On Linux and Mac you set the CMAKE_ARGS like this:

CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

On Windows you can set the CMAKE_ARGS like this:

$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"pip install llama-cpp-python

On Fri, Dec 29, 2023 at 5:26 PM Daniel LaLiberte @.***> wrote:

Thanks for your work and contribution.

Does it work with a GPU? You should document that either way.

— Reply to this email directly, view it on GitHub https://github.com/Dicklesworthstone/swiss_army_llama/issues/2#issuecomment-1872371212, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILNF3Q3VMGIUGFLUCIGBLTYL47Q5AVCNFSM6AAAAAA3XOIVGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZSGM3TCMRRGI . You are receiving this because you modified the open/close state.Message ID: @.***>

Dicklesworthstone / swiss_army_llama

It requires 40G of RAM to work? #2