-
I tried to do it on vps but it doesnt' work.
should I consider updating source code?
Appreciate your help.
-
Hi there, I am able to download model from HF using VideoLlavaForConditionalGeneration.from_pretrained and optimize the model using ipex-llm.optimize_model(). But the process fails on generate() with …
-
T2V is planned to enable inferencing LLMs like Stable Diffusion on CPU/GPU, and training on Habana Gaudi/DG2, as well as improving the Generated Video quality, like more realistic frame, and coherency…
-
### Your current environment
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Debian GNU/Lin…
-
I only have a 16GB graphics card, so I used the CPU to run it,My code is like:
****
import torch
from PIL import Image
from lavis.models import load_model_and_preprocess
device = "cpu"
raw_ima…
-
Hello!
Does TensorRT-LLM supports Medusa with Mixtral 8x7B?
My understanding is that right now the Medusa [convert_checkpoint.py](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/medusa/c…
-
Hi,
I am using `python version 3.9`, `CUDA version 12.2`. I installed the required packages in the README. I also changed the `model_name_or_path` to `--model_name_or_path google/flan-t5-xl \` and `C…
-
### What is the issue?
**I got this error:**
root@bccf6f1eb00f:/data/models# ollama create gte_qwen2:7b -f Modelfile
transferring model data
Error: invalid file magic
**This is my ModelFile:**
…
-
**Describe the current behavior**
I'm using a T4 runtime to do some work with LLMs and after a few hours the Runtime just says "Connecting" and the bottom of the screen says "waiting to finish th…
-
On the powerful GPU 4090, Is it normal to take about 40 seconds to finish one generation on the 7B model?
It is too slow.
model:
MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
MODEL_BASENAME = "llam…