Closed ytnvj2 closed 2 months ago
Same thing happens with Phi-3.5-MoE.
For Phi-3.5-MoE: It looks like the default scheduler path can reach the NormalPipeline, which requires(?) paged attention metadata. Likely something similar is happening for the vision model.
Ah, so the quantized pipelines can handle the lack of paged attention metadata. I'm not clear on why the Python API is routing through a different part of the code, though.
@ytnvj2 can you retry with a GGUF version of the model via the CLI? Using a GGUF might wholly disable vision. The GGUF loading panics in an unrelated area.
Potential confounder: I might be misreading things, but it looks like Paged Attention is disabled for non-CUDA targets (including Metal). https://github.com/EricLBuehler/mistral.rs/blob/366f9f02b7a55af8d8f32df33fd77ea6bcea8b5a/mistralrs-core/src/utils/mod.rs#L225-L233
Hi @JCRPaquin @ytnvj2 thank you for the details and all investigation! Indeed, you are correct the issue arises when there is a discrepancy there. This bug was caused by a regression from #753, and I just merged #759 which should fix this. Can you please confim this works?
@EricLBuehler thanks for the quick response! I'll try the fix in a few hours.
@EricLBuehler Tried out the fix and it works for me now. Thank you for the quick fix. Appreciate it.
Hi, I am running the phi3.5 vision model using the below command on Apple M2 macbook: 'cargo run --release --features metal -- --port 1234 vision-plain -m microsoft/Phi-3.5-vision-instruct -a phi3v'
Everything loads fine, but when I query I get this error:
mistralrs_core::engine: prompt step - Model failed with error: Msg("Forward step expected a PagedAttention input metadata. This was not provided, please ensure that the scheduler config is correctly configured for PagedAttention.")
On the other hand, If I load the model using the python API everything works fine but I am not sure how to enable ISQ in python. '' from mistralrs import Runner, Which, ChatCompletionRequest, VisionArchitecture
runner = Runner( which=Which.VisionPlain( model_id="microsoft/Phi-3.5-vision-instruct", arch=VisionArchitecture.Phi3V, ), ) '' Any idea what might be causing this and how to fix this?