codellama-7b working on GPUs with 24GB memory

FasterDecoding / REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024

Apache License 2.0

177 stars 11 forks source link

codellama-7b working on GPUs with 24GB memory #9

Closed wangpatrick57 closed 7 months ago

wangpatrick57 commented 7 months ago

Previously, codellama-7b did not run on AWS g6.2xlarge machines. I made a small change in rest_test.py to match how the logic was done in gen_model_answer_rest.py, which resolved this memory issue.