Closed Abulhanan closed 2 weeks ago
Version Of Aphrodite v0.5.1
Can you provide a link to your gguf file? And 0.5.1 is very old, please try 0.5.2 (download the .whl and install via pip), or build from main/dev.
wait
Model = "/kaggle/" # @param +.3["Kooten/Kunoichi-DPO-v2-7B-8bpw-exl2", "TheBloke/UNA-TheBeagle-7B-v1-GPTQ", "LoneStriker/Fimbulvetr-11B-v2-GPTQ", "TheBloke/OpenHermes-2.5-Mistral-7B-AWQ", "TheBloke/MythoMax-L2-13B-GPTQ", "TheBloke/wizard-mega-13B-GPTQ"] Austism/chronos-hermes-13b-v2-GPTQ KoboldAI/OPT-6B-nerys-v2 NousResearch/Nous-Hermes-Llama2-13b {allow-input: true}
Revision = "main" #@param []{allow-input: true}
Quantization = "gguf" #@param ["None", "exl2", "gptq", "awq", "aqlm", "quip", "marlin"]
GPU_Memory_Utilization = 1 #@param {type:"slider", min:0, max:1, step:0.01}
Context_Length = 16000 #@param {type:"slider", min:1024, max:32768, step:1024}
enforce_eager_mode = True #@param {type:"boolean"}
launch_kobold_api = False #@param {type:"boolean"}
OpenAI_API_Key = "" #@param []{allow-input: true} FP8_KV_Cache = True #@param {type:"boolean"}
!pip install -U "ray[all]" !pip install grpcio==1.62.1
%pip show aphrodite-engine &> /dev/null && echo "Existing Aphrodite Engine installation found. Updating..." && pip uninstall aphrodite-engine -q -y !echo "Installing/Updating the Aphrodite Engine, this may take a while..." %pip install aphrodite-engine==0.5.1 > /dev/null 2>&1 !echo "Installation successful! Starting the engine now."
!pip3 install pyngrok !echo "Creating a Ngrok URL..." from pyngrok import ngrok !ngrok authtoken 2Xek0NdHusUxivPazybUushIkyx_6gf88UA2EDx34b2RKw8r1 tunnel = ngrok.connect(2242) !echo "============================================================" !echo "Please copy this URL:" print(tunnel.public_url) !echo "============================================================"
model = Model gpu_memory_utilization = GPU_Memory_Utilization context_length = Context_Length api_key = OpenAI_API_Key quant = Quantization enforce_eager = enforce_eager_mode kobold = launch_kobold_api revision = Revision fp8_kv = FP8_KV_Cache
command = [ "python", "-m", "aphrodite.endpoints.openai.api_server", "--dtype", "float16", "--model", model, "--host", "127.0.0.1", "--max-log-len", "0", "--gpu-memory-utilization", str(gpu_memory_utilization), "--max-model-len", str(context_length), "--tensor-parallel-size","2", "--tokenizer philschmid/meta-llama-3-tokenizer" ]
if kobold: command.append("--launch-kobold-api")
if quant != "None": command.extend(["-q", quant])
if enforce_eager: command.append("--enforce-eager")
if fp8_kv: command.append("--kv-cache-dtype fp8_e5m2")
if api_key != "": command.extend(["--api-keys", api_key])
!{" ".join(command)}
the code.
!git clone https://github.com/PygmalionAI/aphrodite-engine.git %cd /kaggle/working/aphrodite-engine/examples/ !python gguf_to_torch.py --input /kaggle/working/Meta-Llama-3-70B-Instruct.IQ1_S.gguf --output /kaggle/
The pre coversion code.
%cd /kaggle/ !wget -N https://huggingface.co/MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF/resolve/main/Meta-Llama-3-70B-Instruct.IQ2_XS.gguf
The code i use to download models.
it's modified version of colab code but i use kaggle it has 2 T4 gpus and 30Gb ram.
if i install it via pip how do i access the pre-conversion script?
You are using unmatched aphrodite package and conversation script. You installed v0.5.1 but used the conversation script of main. You need to either git checkout the specific tag v0.5.1, or build from source. And I am not sure if v0.5.1 supports Llama 3, probably you need a newer version to support the new tokenizer.
okay i'm a bit new but can you help me little to install it??
Since you are using kaggle I suppose you can't build from source, so you would have to wait for the 0.5.3 release to run llama 3 models.
i can run llama 3 easily
8b model easily
but for quantization i get stuck any model .
%cd /kaggle/ !wget -N https://github.com/PygmalionAI/aphrodite-engine/releases/download/v0.5.2/aphrodite_engine-0.5.2+cu118-cp310-cp310-manylinux1_x86_64.whl !pip install /kaggle/aphrodite_engine-0.5.2+cu118-cp310-cp310-manylinux1_x86_64.whl
i used this script and now it's installing it using pip, but now how do i access the conversion?
i can also build from source by git cloning.
git checkout <tag>
For example git checkout v0.5.2
to get the conversation script for 0.5.2
okay..
after using pip installation?
First please keep this github issue clean and precise. Github issues are not chat rooms, please group your responses as a single block if convenient.
The source code and installed pip package are two separate things. If you have installed the v0.5.2 pip package, you clone the Aphrodite git repo, and checkout the responding code using git checkout v0.5.2
, then run the conversion script. That assumes you have a shell environment. I don't know how kaggle works, if you have questions related to kaggle or using git checkout/pip in kaggle, please ask them in the corresponding support channels.
Your current environment
How did you install Aphrodite?
When using pre conversion i ran into following error ValueError: 17 is not a valid GGMLQuantizationType
but when using automatic conversion my engine itself, error doesn't shows and it converts but i'm on low memory, with auto conversion.