-
This could be question rather than a feature request.
flashinfer is not supported for AMD GPUs and it's not currently planned until a [later version](https://github.com/flashinfer-ai/flashinfer/iss…
-
## ❓ General Questions
After generating the Android MLCChat app based on the model gemma-2-2b-it-q4f16_1 and installing it on my device, I found that the chatbot seems not to retain previous co…
-
While the Replicate API approach allows you to select which version of flux to run, the local approach defaults to the dev model. Can you make it so that schnell can be run locally? Additionally, the …
-
你好有两个简单问题,
1. https://huggingface.co/BAAI/bge-reranker-large "We have updated the [new reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_reranker), supporting larger…
-
**Describe the bug**
gemma2 2b is not available for selection in the models download menu
https://ollama.com/library/gemma2:2b
-
环境:A100
在使用Gemma2时按照下述方式加载模型,将max_batch_size设置为不同大小进行推理,推理数据大小1000(need_infer)
`pipe = lmdeploy.pipeline(model_path=model_id,backend_config=TurbomindEngineConfig(max_batch_size=256, cache_max_entry_…
-
Seems like the latest changes for supporting ShieldGemma (Gemma 2 classification model) aren't working in 0.8.0. I have the dependency and did a copy paste from your example but still I got:
```
C…
-
Similar to this issue: https://github.com/s-kostyaev/ellama/issues/5
I'm using ellama defaults as best as I can tell (using `llm-client` layer in spacemacs, but I did try an `emacs -q` and package …
-
**Steps to reproduce**
1. Run gemma2 from ollama
**Result**
The model instance is in error state
**Expected behavior**
It should be running
**Environment**
- GPUStack version: 0…
-
### System Info
i'am using sagemaker to run finetune on an ml.g5.48xlarge with the requirement file :
```
transformers==4.44.2
datasets==3.0.0
accelerate==0.34.2
bitsandbytes==0.44.0
hugging…