-
https://github.com/llm-jp/llm-jp-eval/pull/115 の `offline_inference_example.py` を参考にvLLMでオフライン推論処理を実装する。
-
https://github.com/llm-jp/llm-jp-eval/pull/115 の `offline_inference_example.py` を参考にFastGenでオフライン推論処理を実装する。
-
`offline_inference/`配下のツールを使ったオフライン推論の一連の処理の流れについて詳しく説明した`offline_inference/README.md`を作成する。処理の流れは次の通り。
- `dump_prompts.py`の実行
- オフライン推論ツールの実行
- `evaluate_llm.py`で`offline_dir`を指定して実行
可能なら英語版の`offli…
-
Gemini Nano weights from Google Chrome are on [HuggingFace](https://huggingface.co/wave-on-discord/gemini-nano). You can run the inference using this model with [MediaPipe LLM inference](https://githu…
-
I completed the installation of DistServe. When I tried to run the offline.py using my downloaded llama2 model, I encountered the following problem.
Traceback (most recent call last):
File "/hom…
-
Failed to run python `offline_inference.py` from [link](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/vLLM-Serving) for vLLM offline inference on CPU. It seems that `llm…
-
**Checklist**
- [x] I made sure that there are no existing issues - open or closed - to which I could contribute my information.
- [x] I have read the FAQ and my problem isn't listed.
- [x] I …
-
### 🚀 The feature, motivation and pitch
We currently do not apply chat template for the offline `LLM` class. It might be useful to provide similar interface as Huggingface chat pipeline to utilize/ac…
-
运行ChatGLM-6B报错误后(错误信息见issue-运行tutorial中的ChatGLM-6B报grpc错误),发现VGPU-CORE资源不足,但是eggroll的dashboard展示的可分配VGPU-CORE资源数量是正常的。
到mysql中手动修改node 和processor manage表,将deepspeed任务pre-allocated的VGPU-CORE记录清除,才能重新提…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I am concerned about data privacy. For example, if we are using paid LLMs via an API lik…