Closed theobjectivedad closed 3 weeks ago
Thanks for the PR. The torch bump needs a few more changes, I'll handle it here myself
Thanks for the PR. The torch bump needs a few more changes, I'll handle it here myself
Certainly, I am working on the flash-attn issue mentioned in #453 now and will remove draft status once I've tested the fix. Hopefully this doesn't go too far down the rabbit hole...
I bumped the torch version in #467
This PR should be safe now. Would you mind merging the latest dev into your branch?
EDIT: just noticed your PR is targetted to main, can you rebase it to dev please?
This LGTM. Any other changes you think we need to make, @theobjectivedad?
I'll merge this now as we need to make a release ASAP, thanks for the contribution!
This PR fixes #458, where under some circumstances UID 1000 needs to write triton cache data to
/app/aphrodite-engine/.triton
which was previously owned by UID 0.Additionally, the Dockerfile will now accept a build argument to limit
ninja
to a top limit of workers for systems with limited RAM. For example, the following command will cap at 5 parallel jobs and 128GB RAM:Finally, I needed to bump to
torch==2.3.0
to address the error discussed in #453.