-
Hello,due to the network environment requirements of the development machine, it cannot access [huggingface.co](http://huggingface.co/) to download and load the model. I have downloaded the VLM2Vec-fu…
-
Hello Unsloth Team,
I am trying to finetune the **dwb2023/phi-3-vision-128k-instruct-quantized** model using Unsloth, but I encountered a NotImplementedError. The error message indicates that this …
-
Thanks so much for your work on this!
How can I deploy this fine-tuned model (expose via API endpoint)? Can I use vLLM or a library like this: https://github.com/EricLBuehler/mistral.rs, which sup…
-
Using phi3 vision model:
A client sees this warning on their machine for text generation, still works:
```
Unable to create a device from version 1.614.0 of the DirectX 12 Agility SDK.
You can s…
-
def _save(self, output_dir: Optional[str] = None, state_dict=None):
# If we are executing this function, we are the process zero, so we don't check for that.
output_dir =…
-
### Motivation
The latest release of microsoft phi3 4.2b 128k context vision model looks promising in performance and resource saving one too as it boast just 4.2b parameter. So it would be a great f…
-
我正在对internvl2使用视频数据进行full finetune,显卡为单张32G V100,报错torch.cuda.OutOfMemoryError: CUDA out of memory.
torchrun /cache/InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py \
--model_name…
-
### What is the issue?
My initial goal is to check if specific model is available using Ollama API.
I use OpenAI library `github.com/sashabaranov/go-openai` to do that.
The problem is when I …
-
## ❓ General Questions
I tried to compile TVM and MLC-LLM on jetson orin AGX(jp6 cu122), in order to inference phi3.5v. However, I discovered phi3 processes images is much slower than hugging face …
-
System Info
GPU: NVIDIA RTX 4090
TensorRT-LLM 0.13
quest 1: How can I use the OpenAPI to perform inference on a TensorRT engine model?
root@docker-desktop:/llm/tensorrt-llm-0.13.0/examples/apps# pyt…