-
### System Info
TGI version latest;single NVIDIA GeForce RTX 3090;
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially supported command
- [ ] My own modifications
…
-
### What happened?
I'm getting a register count overflow when trying to run llama3.1_405b_fp16 for 8 HIP devices targeting gfx942
```iree/runtime/src/iree/vm/bytecode/verifier.c:345: RESOURCE_EXHAUST…
-
### System Info
lorax_version=0.12.0
Using Docker to host the 11b model it runs perfectly for Llama3.1-8b
But with LLama3.2-11b I am getting the following error
ModuleNotFoundError: No module…
-
Thanks for this interesting project.
I got to know about this project while using Ollama. Since Ollama doesn't support log_prob, I was interested to try Optillm.
I have been trying for the last fe…
-
```python
from edsl import Model
import time
models_list = [['Austism/chronos-hermes-13b-v2', 'deep_infra', 0], ['BAAI/bge-base-en-v1.5', 'together', 1], ['BAAI/bge-large-en-v1.5', 'together', …
-
使用脚本如下,请问怎么转成torchrun的命令?
```
NPROC_PER_NODE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \
--model_type llava1_6-llama3_1-8b-instruct \
--model_id_or_path .cache/modelscope/hub/swift/…
-
When running the notebook for inference using [Llama3](https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb)
```…
-
-
hellow, I failed to covert trt-llm Llama3.2 3B when I tried to run convert_checkpoint.py.
(like this link - https://github.com/NVIDIA/TensorRT-LLM/issues/2339)
I want to know if Llama3.2 3B model con…
-
Hi,
I want to work with the newly added model llama3_2_3b_instruct_q40, but it shows me error when downloading the model in docker container. I checked the launch.py and the issue is caused by this …
Znbne updated
1 month ago