Closed puppetm4st3r closed 1 month ago
it's a privilege issue, fixed running chmod 777 -R * inside /app/aphrodite-engine/.cache
if useful for someone with the same issue, workarround:
docker exec -u root "container_name" -R 777 /app/aphrodite-engine/.cache
please 🙏, it's a simple bug but adds a lot of value, it practically kill all the previous effort of json guided decoding @AlpinDale 🙏 and in every production release we have to enter inside the container to fix it =(
Hi sorry, I totally missed this issue! Can you run the docker in privileged mode?
thats a good idea, I cannot run permanently in that mode, but I can handle that for a while, thanks!
We already resolved a similar issue related to triton - it should be fixed in the latest docker. Have you tried it?
i'm trying the last official image, still has the problem, and now I got another problem with moes with the last version, have create the other issue. (sorry for reporting so many bugs, I use a lot your engine)
The problem indeed still exists. One solution to this is to mount a host folder with -v /my/local/folder:/app/aphrodite-engine/.cache
with the added benefit of keeping the cache during container reboots.
EDIT: I found the problem, it is described in the last comment
Your current environment
🐛 Describe the bug
When trying to generate guided output (with a pydantic json schema) its throws an exception of "/app/aphrodite-engine/.cache/" not found and for some reason the engine has not privileges to create that directory. I entered to the container and created the directory by my self. Tryed again and now got an strange exception of SQLite3
Also get this message: Token indices sequence length is longer than the specified maximum sequence length for this model (2023 > 1024) but i have configured the environment to 12k length
engine run parameters:
+ exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /data/hub --model LoneStriker/Smaug-34B-v0.1-GPTQ --dtype float16 --kv-cache-dtype fp8_e5m2 --max-model-len 12000 --tensor-parallel-size 2 --gpu-memory-utilization .97 --enforce-eager --disable-log-stats --api-keys 123 --block-size 8 --max-paddings 512 --port 3000 --swap-space 10 --chat-template /home/workspace/chat_templates/gorilla_v2__fc.jinja --served-model-name dolf --max-context-len-to-capture 512 --max-num-batched-tokens 32000 --max-num-seqs 62 --quantization gptq
used json schema:
{"description": "Useful to return the text summarization task", "properties": {"summary": {"description": "The resulting summary of the provided text.", "title": "Summary", "type": "string"}}, "required": ["summary"], "title": "result", "type": "object"}
using the oficial open ai client with: