EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
4.43k stars 307 forks source link

Phi3.5V Server API Error: Forward step expected a PagedAttention input metadata. #756

Closed ytnvj2 closed 2 months ago

ytnvj2 commented 2 months ago

Hi, I am running the phi3.5 vision model using the below command on Apple M2 macbook: 'cargo run --release --features metal -- --port 1234 vision-plain -m microsoft/Phi-3.5-vision-instruct -a phi3v'

Everything loads fine, but when I query I get this error:

mistralrs_core::engine: prompt step - Model failed with error: Msg("Forward step expected a PagedAttention input metadata. This was not provided, please ensure that the scheduler config is correctly configured for PagedAttention.")

On the other hand, If I load the model using the python API everything works fine but I am not sure how to enable ISQ in python. '' from mistralrs import Runner, Which, ChatCompletionRequest, VisionArchitecture

runner = Runner( which=Which.VisionPlain( model_id="microsoft/Phi-3.5-vision-instruct", arch=VisionArchitecture.Phi3V, ), ) '' Any idea what might be causing this and how to fix this?

JCRPaquin commented 2 months ago

Same thing happens with Phi-3.5-MoE.

For Phi-3.5-MoE: It looks like the default scheduler path can reach the NormalPipeline, which requires(?) paged attention metadata. Likely something similar is happening for the vision model.

JCRPaquin commented 2 months ago

Ah, so the quantized pipelines can handle the lack of paged attention metadata. I'm not clear on why the Python API is routing through a different part of the code, though.

@ytnvj2 can you retry with a GGUF version of the model via the CLI? Using a GGUF might wholly disable vision. The GGUF loading panics in an unrelated area.

JCRPaquin commented 2 months ago

Potential confounder: I might be misreading things, but it looks like Paged Attention is disabled for non-CUDA targets (including Metal). https://github.com/EricLBuehler/mistral.rs/blob/366f9f02b7a55af8d8f32df33fd77ea6bcea8b5a/mistralrs-core/src/utils/mod.rs#L225-L233

EricLBuehler commented 2 months ago

Hi @JCRPaquin @ytnvj2 thank you for the details and all investigation! Indeed, you are correct the issue arises when there is a discrepancy there. This bug was caused by a regression from #753, and I just merged #759 which should fix this. Can you please confim this works?

JCRPaquin commented 2 months ago

@EricLBuehler thanks for the quick response! I'll try the fix in a few hours.

ytnvj2 commented 2 months ago

@EricLBuehler Tried out the fix and it works for me now. Thank you for the quick fix. Appreciate it.