Open ycros opened 10 months ago
Does it display another error when you kill the server with Ctrl + C?
No idea, but it still seemed somewhat functional. I sort of just killed the entire runpod after that.
The most likely cause for that error is a COOM error, so you may need to lower your number of threads.
I dunno, I tried again - this time instead of an fp16 with an AWQ 32g quant of mixtral (like 26gb on disk) on 2 A6000s (48GB vram each). I did, in a separate execution, on a separate server, push it far until it OOM'd and I clearly saw those CUDA OOM errors. I don't see any such messages in this case.
This time I kept it only to 1 thread in the horde client, I tried both gmu 0.98 and 0.8 - though I frankly have no idea how I should be tuning these values.
My cmd line: python -m aphrodite.endpoints.kobold.api_server --host 0.0.0.0 --served-model-name BagelMIsteryTour-v2-8x7B --model ~/ycros/BagelMIsteryTour-v2-8x7B-AWQ --max-length 1024 -tp 2 -gmu 0.8 --quantization awq --kv-cache-dtype fp8
I'm on a39eeb7188d8bc91a43712435b27ad9e4c2b98d1
running from source.
The failed requests as reported by horde are all these:
Something went wrong when processing request. Please check your trace.log file for the full stack trace. Payload: {'prompt': 'PROMPT REDACTED', 'n': 1, 'max_context_length': 2048, 'max_length': 64, 'rep_pen': 1.1, 'rep_pen_range': 1024,
'rep_pen_slope': 0.7, 'temperature': 0.9, 'tfs': 1.0, 'top_a': 0.0, 'top_k': 0, 'top_p': 0.9, 'typical': 1.0, 'sampler_order': [6, 0, 1, 2, 3, 4, 5], 'use_default_badwordsids': True, 'stop_sequence': [], 'min_p': 0.0, 'dynatemp_range': 0.0,
'dynatemp_exponent': 1.0, 'quiet': True, 'request_type': 'text2text', 'model': 'aphrodite/BagelMIsteryTour-v2-8x7B'}
When I stop it:
^CINFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [7657]
(RayWorker pid=9414) INFO 01-21 10:22:17 model_runner.py:459] Graph capturing finished in 35 secs.
(RayWorker pid=9414) [W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
root@d38248ce23ec:~#
Here's the log from the terminal as far as my tmux buffer went: aphro-log.txt
Does it log anywhere else I should be looking at before I shut this pod down? Is there anything else you'd like me to try to debug this? (I will probably shut the pod down in say, 12 hours)
Everything seems to work fine via the embedded klite interface, but when I pointed horde at it, it started throwing these:
It seems to kinda sorta maybe still serve horde requests?