Open msaroufim opened 2 years ago
Many examples are not working with T4/V100, such as deterctron2 and stable diffusion, this is why we directly blocked V100 and T4.
Another reason is that CUTLASS focus is shifted to Ampere and Hopper, we have to minus some features to reduce maintain cost.
@antinucleon Thanks for clarification. I think this would impact many users that are using lower end gpus for inference workloads and looking for these optimization to make it even cheaper. Given that ampere gpus specially on cloud providers such as AWS are not easy to access, I wonder if there is any particular reason about this shift/ any opportunity to extend the support.
@HamidShojanazeri Thanks for suggestion. Given our team size and our workloads on supporting internal production needs, we don't have bandwidth to enable V100/T4. If community/NVIDIA is going to help on enabling T4/V100 on all examples that will be fantastic.
@philschmid who I figure may be interested in community support. It may be worth scoping this exercise to community members so it's more scalable for us to support more examples. So something like
At least I wonder how many models will fall under bucket 3
@antinucleon Is there a list to know which kernels are not supported in V100? For example, in stable diffusion what is blocking? We could avoid only those kernels, until they may be backported.
I don’t have V100 access, will try to find one and make the list.
On Mon, Oct 24, 2022 at 18:45 Ehsan Azar @.***> wrote:
@antinucleon https://github.com/antinucleon Is there a list to know which kernels are not supported in V100? For example, in stable diffusion what is blocking? We could avoid only those kernels, until they may be backported.
— Reply to this email directly, view it on GitHub https://github.com/facebookincubator/AITemplate/issues/37#issuecomment-1289866729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTLXQLMMJ3OSHDIHH2JXDWE43SZANCNFSM6AAAAAARB357LY . You are receiving this because you were mentioned.Message ID: @.***>
-- Bing Xu
@antinucleon does any updates on this issue?
The
README.md
saysNVIDIA: AIT is only tested on SM80+ GPUs (Ampere etc). Not all kernels work with old SM75/SM70 (T4/V100) GPUs.
Which I interpreted as it may work but we won't guarantee it. However in https://github.com/facebookincubator/AITemplate/blob/main/python/aitemplate/testing/detect_target.py#L41 there's an explicit gate on V100 which if I fixed the example works and is also 2x faster
If this was not intended, please let me know I can make the PR to fix this. V100 and T4 are by far the most popular GPUs I see among enterprises.
Performance on V100
Repro