Open lmcl90 opened 1 month ago
Yes, you can try to use TensorRT-LLM for the vision encoders. We have Bert example, DiT example, and community also contribute a SDXL model. I think it's not hard to develop a ViT model.
@QiJune Thanks for your replay. I will have a try.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
I see that the multi-model models in the example all use TensorRT directly to deploy vision encoders, why not use TensorRT-LLM? Are there known issues or challenges associated with integrating Context FMHA into visual encoders?