fix: restore backwards compatibility with sm_60 (P100 and GP100)

I was able to build and run this on 1x and 4x P100's more or less flawlessly with the provided runtime. I tested turboderp's Llama 3 8B exl's and MaziyarPanahi's 8B and 70B gptq.

There was an issue with --context-shift , which is apparently due to Triton currently being limited to Cuda 7.0. The GPTQ's endlessly generated, but that's a known issue with the models tested, so I suspect it was unrelated.

Otherwise I really wasn't able to find any issues. Everything I tried just worked in my setup, or gave reasonable error messages about requiring higher Cuda releases.

Performance, once I adjusted to the way Aphrodite reports performance, was quite pretty solid - The 70B model on -tp 4 ran at 12t/s, up from 3.5 for tabbyapi. The 8B model ran at 50t/s on a single P100, or 25t/s on multiple. It ran at around 40t/s on tabbyapi.

PygmalionAI / aphrodite-engine

fix: restore backwards compatibility with sm_60 (P100 and GP100) #444