-
@mroch @li-boxuan @jeremi @penberg @JensRoland
integrate a feature that can allow user to use multiple llm models in the project with their special expertise
for example :
when user add 3…
-
### Describe your problem
I deploy ragflow (infiniflow/ragflow:v0.9.0) on aws eks. I have two nodes to run all the dependencies ( redis, mysql, minio, elasticsearch)
**Nodes detail:**
RAM: 64 GB…
-
### What is the issue?
Mixtral 8x22b instruct outputs are either empty or gibberish.
I have tried various quantizations: q4, q4_k_m, q5, etc. All seem problematic.
Other models (e.g., llama3, com…
PLK2 updated
2 months ago
-
### What is the issue?
llm_load_tensors: ggml ctx size = 0.13 MiB
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 255, got 254
llama_load_model_fro…
-
Currently, TensorRT-LLM requires that LoRA weights dtype match the base model dtype. The check is here:
https://github.com/NVIDIA/TensorRT-LLM/blob/9dbc5b38baba399c5517685ecc5b66f57a177a4c/cpp/tensor…
-
Running make after cloning seems to result in the following:
```cpp
~/llama3.cuda$ make
nvcc -DUSE_CUBLAS=1 -g -o runcuda llama3.cu -lm -lcublas
/usr/include/c++/11/bits/std_function.h:435:145: …
-
std::vector routes = {
{
"/v1/chat/completions",
HttpMethod::METHOD_POST,
std::bind(&handleCompletionsRequest, std::placeholders::_1, &api)
…
-
When I deal with Chinese text, this is **randomly** happening during query. I check the respose. The answer is actually there. Just can't undertand why.
-
Opening a new issue for the previously opened issue here -- https://github.com/huggingface/tokenizers/issues/1517
Here we can see that the desired behavior for `return_offsets_mapping` from Mistral…
-
### Description
![上下文1](https://github.com/modelscope/modelscope-agent/assets/56472384/a1b21f26-04d7-420a-a3b2-a5085300f243)
![上下文2](https://github.com/modelscope/modelscope-agent/assets/56472384/32…