efficient-llm Search Results

1000+ results
for efficient-llm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Undertone0809/promptulate #811

[use cases] Vertical RAG search + streamlit

## 🚀 Feature Request 使用 streamlit 构建一个基于 web 的 AI 搜索引擎，拥有以下能力： - 输出结果可以有 reference - 输出“猜你想问” ---- ## References 1 实现垂类 AI 搜索引擎 SOP👇 # 确定三个核心问题： 1. source list 从哪些地方检索数据 2. answ…

Undertone0809 updated 3 days ago
1
triton-inference-server/tensorrtllm_backend #535

Bug in tensorrt_llm_bls

I think there is a bug [here](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm_bls/1/model.py#L120) in the implementation of bls ba…

binhtranmcs updated 1 week ago
2
ggerganov/llama.cpp #7929

Bug: Random output from llama-cli in chat mode.

### What happened? After last week's updates llama-cli (former main) either chats with itself, outputs random tokens, or stops answering altogether. The problem is the same on CPU and on NVIDIA GPUs…

dspasyuk updated 1 week ago
6
neonbjb/tortoise-tts #694

Batch Inference?

So we're having issues inferencing efficiently at scale, and of course we're processing the audio parts one by one as is default for inference, but is there any support for batch inference to speed th…

addytheyoung updated 2 months ago
1
YeonwooSung/MLOps #106

How Meta trains large language models at scale

[meta engineering blog post](https://engineering.fb.com/2024/06/12/data-infrastructure/training-large-language-models-at-scale-meta/) - Meta requires massive computational power to train large lang…

YeonwooSung updated 1 month ago
1
axolotl-ai-cloud/axolotl #1119

MLX Support

Hi, It would be great to have MLX support in Axolotl. MLX has been shown to be able to quickly and efficiently finetune many LLMs, including 7B LLMs on consumer hardware. Thank you! (edit: [update]…

fakerybakery updated 5 months ago
2
vllm-project/vllm #6189

[Feature]: Precise model device placement

### 🚀 The feature, motivation and pitch Hi all, I was wondering if it's possible to do precise model device placement. For example, I would like to place the vLLM model on GPU 1 and let GPU 0 do othe…

vwxyzjn updated 2 weeks ago
9
vllm-project/vllm #1304

Could you support Attention Sink?

Efficient Streaming Language Models with Attention Sinks [paper](https://arxiv.org/abs/2309.17453) These repo has already implemented it: [attention_sinks](https://github.com/tomaarsen/attention_si…

dongxiaolong updated 3 months ago
9
OpenGVLab/InternVL #135

InternVL-det?

Would it be possible to enhance the detection capability of InternVL by incorporating more data combined with grounding instructions during the fine-tuning stage?

fyting updated 2 days ago
3
e2b-dev/code-interpreter #19

Swift and other programming languages

Is it possible to interpret Swift code with this somehow? That would be very useful for mobile app development.

mobile-appz updated 3 days ago
2

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for efficient-llm

1000+ results
for efficient-llm