Open imabot2 opened 9 months ago
Was this model working with offload_per_layer = 3
? I was trying to use it on V100 in google colab but faced an issue with Triton.
Most likely this is a version issue with Triton. If you are using v2.2.0 then you have to downgrade. You can refer to the issue I have raised #25.
Test if that works.
Hi, you did an awesome work ! I ran your code in an RTX3090 with
offload_per_layer = 0
: Awesome !!!I noticed that when I change the device for my second GPU
device = torch.device("cuda:1")
, the model is properly loaded in the GPU memory, but inference does not work:I can't figure out what's wrong, any idea?