-
### Your current environment
```text
vllm=0.5.4
```
llm = LLM(
model=MODEL_NAME,
trust_remote_code=True,
gpu_memory_utilization=0.5,
max_model_len=2048,
tensor_paralle…
-
python run_awq.py --model_name Qwen/Qwen1.5-7B-Chat --task quantize
Namespace(model_name='Qwen/Qwen1.5-7B-Chat', target='aie', profile_layer=False, task='quantize', precision='w4abf16', flash_attenti…
-
### #
- [X] I have searched the existing issues
### Is your feature request related to a problem? Please describe it
Please add a Web Search Feature in it . I think duckduckgo api will be best for …
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I created two RetrieverTools for retrieving and answering specific questions, but for ot…
-
### Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to [Discussions](https://github.com/langgenius/dify/discussions/categories/general).
- [X] I hav…
-
Hi, I've just noticed that by setting use_prefix_cache=True/False, the results can change quite substantially.
Take, for example, this code here:
```
llm = AutoModelForCausalLM.from_pretra…
-
### Your current environment
vllm version: 0.6.3.post1
### Model Input Dumps
_No response_
### 🐛 Describe the bug
I see on the official site of gemma: https://huggingface.co/google/gemma-2b, cont…
-
### What you would like to be added?
Inspired by this research paper [Vidur: A Large-Scale Simulation Framework For LLM Inference](https://proceedings.mlsys.org/paper_files/paper/2024/file/b74a8de47d…
-
### Self Checks
- [X] I have searched for existing issues [search for existing issues](https://github.com/langgenius/dify/issues), including closed ones.
- [X] I confirm that I am using English to su…
-
### Organization Name
SWIRL Corporation
### Main office location
235 Bear Hill Rd, Suite 201, Waltham MA 02451
### What regions of the world do you serve?
Global, North America
### Business desc…