-
Is there any way to use Ollama to host the llm models?
-
Hi.
I have been doing some benchmarks on nvidia V100 32GB gpu.
First, I benchmarked Llama2-7B-chat using huggingface transformers and CTranslate2. I saw reduced latency when using ct2 ( 12 secon…
-
Under Windows :
java -jar chat-console.jar --model llama2 --system
the result is :
Error: chat.octet.exceptions.ServerException: Can not read model configuration file, please make sure it is …
-
### Describe the feature
请问我要在colossal-llama2-7B上面训练大量的长文本,应该怎么构造数据集?
-
![image](https://github.com/ljy0ustc/LLaRA/assets/47343901/4bd566e6-5dc8-4f80-8b19-c60e26b6c414)
Will you help me to solve this problem ? 👆
-
Reproducing steps:
1. Clone the vllm repo and switch to [tag v0.3.1](https://github.com/vllm-project/vllm/tree/v0.3.1)
2. Build the Dockerfile.rocm dockerfile with instructions from [Option 3: Bui…
-
### System Info
- `transformers` version: 4.41.0.dev0
- Platform: Linux-5.15.0-92-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.21.4
- Safetensors version…
-
NeuronXXXModel classes (i.e. NeuronDecoderModel - optimum/neuron/modeling_decoder.py) invoke transformers-neuronx to compile the target model, however these classes don't pass all the supported input …
-
When using LLM to do NER task, there is a warning saying "This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the mode…
-
### What happened?
When trying to use Ollama with LiteLLM and with Stream=True, an exception is thrown.
litellm version: 1.15.0
1. Serve ollama locally on port 11434 (or replace the port in the…