Closed eoastafurov closed 4 weeks ago
From the log, TensorRT-LLM works fine. This may be a model accuracy issue. Can you get the expected outputs by running the inference code on HF model page?
Can you get the expected outputs by running the inference code on HF model page
By running HF example I get outputs: "The image shows a cityscape with a prominent building that resembles the Marina Bay Sands hotel in Singapore. The presence of the Merlion statue and the unique architectural style of the building suggest that this is Singapore."
any updates?
Same issue. @eoastafurov Thanks for reporting
@byshiue @kaiyux
I believe this issue might be related to image preprocessing and/or the ptuning_setup_phi3
function (link: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/multimodal_model_runner.py#L838). It seems the LLM is functioning correctly but isn't able to see the image correctly.
I tested the same example image with the input text:
""Can you describe in detail what you observe in this image? Is it white noise, a completely black screen, or something else? Please provide as much information as possible about it.""
The output I received was:
The image provided appears to be a solid color with no discernible patterns, text, or objects. It is not white noise, as white noise would typically have a grainy texture and a random distribution of light and dark areas. It is also not a completely black screen, as there is no visible content or display on the screen. The color of the image is a uniform, light shade, possibly white or a very light gray, with no variation across the entire image. Without additional context or variations in the image, it is not possible to provide a more detailed description.
Can someone confirm if they've run the example from the documentation and received correct outputs?
same observations
Sorry for the late response, I can reproduce this issue and we are fixing it.
any updates ?
This is actually a TensorRT issue, still work in process.
Hi, Just checking to see if we have some good news
any updates?
Similar to this issue: https://github.com/NVIDIA/TensorRT-LLM/issues/2369#issuecomment-2435782359
We found there are some compatibility issues between HuggingFace and torch.onnx. And we are also trying to find a workaround to resolve this from TRT-LLM side.
see https://github.com/NVIDIA/TensorRT-LLM/issues/2369#issuecomment-2455888795. closing for now. The next week's main branch update will contain the workaround fix
System Info
Hello TensorRT-LLM team! 👋 I'm facing an issue where the inference output does not contain the expected "Singapore" text. Below are the details of my setup and steps to reproduce the issue.
🔧 System Information:
x86_64 [AMD EPYC 7642]
512G
Nvidia A30
0.13.0.dev2024090300
Docker v24.0.6
560.35.03
Ubuntu 22.04
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
🐳 Dockerfile:
📋 requirements.txt:
⚙️ Commands to Reproduce the Environment:
📜 Pip Freeze Inside Container:
📝 Steps to Reproduce the Bug Inside the Container:
Expected behavior
Outputs should contain "Singapore"
actual behavior
❗ Unexpected Output of run.py (Missing "Singapore"):
additional notes
❓ Issue Description:
The model should correctly infer that the image depicts "Singapore," specifically referring to the Merlion in the image. However, instead of "Singapore," the model outputs a generic response indicating an inability to provide location details