-
## 🚀 Feature Request
使用 streamlit 构建一个基于 web 的 AI 搜索引擎,拥有以下能力:
- 输出结果可以有 reference
- 输出“猜你想问”
----
## References 1
实现垂类 AI 搜索引擎 SOP👇
# 确定三个核心问题:
1. source list 从哪些地方检索数据
2. answ…
-
I think there is a bug [here](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm_bls/1/model.py#L120) in the implementation of bls ba…
-
### What happened?
After last week's updates llama-cli (former main) either chats with itself, outputs random tokens, or stops answering altogether. The problem is the same on CPU and on NVIDIA GPUs…
-
So we're having issues inferencing efficiently at scale, and of course we're processing the audio parts one by one as is default for inference, but is there any support for batch inference to speed th…
-
[meta engineering blog post](https://engineering.fb.com/2024/06/12/data-infrastructure/training-large-language-models-at-scale-meta/)
- Meta requires massive computational power to train large lang…
-
Hi,
It would be great to have MLX support in Axolotl. MLX has been shown to be able to quickly and efficiently finetune many LLMs, including 7B LLMs on consumer hardware.
Thank you!
(edit: [update]…
-
### 🚀 The feature, motivation and pitch
Hi all, I was wondering if it's possible to do precise model device placement. For example, I would like to place the vLLM model on GPU 1 and let GPU 0 do othe…
-
Efficient Streaming Language Models with Attention Sinks [paper](https://arxiv.org/abs/2309.17453)
These repo has already implemented it:
[attention_sinks](https://github.com/tomaarsen/attention_si…
-
Would it be possible to enhance the detection capability of InternVL by incorporating more data combined with grounding instructions during the fine-tuning stage?
-
Is it possible to interpret Swift code with this somehow? That would be very useful for mobile app development.