Open underlines opened 3 months ago
replicated the error
error: the argument '--port <PORT>' cannot be used multiple times
when running
docker run --gpus all -it --rm --ipc=host -p 8000:8000 -e HUGGING_FACE_HUB_TOKEN=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 -v f:/AIOps/Models/8b:/model ghcr.io/choronz/mistral.rs:cuda-89-latest --serve-ip 127.0.0.1 --port 8000 gguf -m /model -f Llama-3.1-Storm-8B.Q6_K.gguf
without --port 8000 the model loaded ran.
Seems that the port is still hardcoded in the Dockerfile.cuda-all file
PORT=8000 \ line 27 ENTRYPOINT ["mistralrs-server", "--port", "8000", "--token-source", "env:HUGGING_FACE_HUB_TOKEN"] line 54
@choronz please feel free to open a PR if you have a fix!
Describe the bug
My environment
Windows 11 Pro, Docker Desktop, WSL2 Ubuntu Engine, latest nvidia driver
Cuda test
I made sure the Docker WSL2 Cuda implementation works correctly by executing:
docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
as stated in the documentation. So cuda works inside Docker with WSL2.Model loading error
docker run --gpus all --rm -v C:\Users\xxx\.cache\lm-studio\models\duyntnet\Meta-Llama-3.1-8B-Instruct-imatrix-GGUF:/model -p 8080:8080 ghcr.io/ericlbuehler/mistral.rs:cuda-90-sha-8a84d05 gguf -m /model -f Meta-Llama-3.1-8B-Instruct-IQ4_NL.gguf
leads to
maybe iMatrix Quants are not supported?
Trying a normal gguf quant also doesn't seem to work:
docker run --gpus all --rm -v C:\Users\xxx\.cache\lm-studio\models\bartowski\Meta-Llama-3.1-8B-Instruct-GGUF:/model -p 8080:8080 ghcr.io/ericlbuehler/mistral.rs:cuda-90-sha-8a84d05 gguf -m /model -f Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf
leading to:
This is a newer quant after the rope freq issue was fixed in llama.cpp
Port argument not found
Also: I can use the docker argument
-p 8080:1234
to map ports. The mistral.rs arguments for--serve-ip 0.0.0.0
works, the--port 1234
doesn't:docker run --gpus all --rm -v C:\Users\Jan\.cache\lm-studio\models\bartowski\Meta-Llama-3.1-8B-Instruct-GGUF:/model -p 8080:1234 ghcr.io/ericlbuehler/mistral.rs:cuda-90-sha-8a84d05 --serve-ip 0.0.0.0 --port 1234 gguf -m /model -f Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf
leads to
Latest commit or version
Using Docker ericlbuehler/mistral.rs:cuda-90-sha-8a84d05