Vahe1994 / SpQR

Apache License 2.0
515 stars 40 forks source link

improved offload_activations #27

Closed poedator closed 12 months ago

poedator commented 1 year ago

improving offloading in quantize() adding offloading in eval() tested in Nirvana

Vahe1994 commented 12 months ago

Great work @poedator, this seems good.

According the experiments you provided, results with and without offloading ,matches with paper's ppl (see picture below) . By turning on activation offloading, we have a little slow up in speed, but we gain significantly memory save. In the future, we can change to doing activation offloading as default behavior, but I don't insist(this will lead to more ram and cpu consumption).With this people can quantize comfortably on 3090 with 24GB gpu 65b, and 30b on 1080ti(need to be checked).

photo_2023-07-24_13-17-41

As for PR:

  1. Please add better documentation to offloading activation https://github.com/Vahe1994/SpQR/blob/d71dcc29785b3c967d45c4a0c94d0fa4cd307040/main.py#L514C1-L514C57
  2. In readme add few words about this options.