Open TianzhongSong opened 3 weeks ago
Thank you for your question. Unfortunately, cross-attention doesn't support Medusa yet, and we don't plan to add this feature. However, if you can share more details about your use case, we would happily consider it further.
TRT-LLM version: v0.11.0
I'm deploying a bart model with medusa heads, and i notice this issue https://github.com/NVIDIA/TensorRT-LLM/issues/1946, then i adapted my model with follow steps:
However, encountered the following error:
Cross attention can not use medusa? Any idea?