Updates the non-cli code snippet to automatically select the compatible version of flash attention (rather than needing the user to configure it themselves).
We could do the same for device as well. If a lot of usage is coming from mps this could make sense. In my experience, most Transformers usage comes from cuda, but this repo has more of an mps push, so might be different.
Updates the non-cli code snippet to automatically select the compatible version of flash attention (rather than needing the user to configure it themselves).
We could do the same for device as well. If a lot of usage is coming from mps this could make sense. In my experience, most Transformers usage comes from cuda, but this repo has more of an mps push, so might be different.
cc @Vaibhavs10