FasterDecoding / REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024
Apache License 2.0
166 stars 10 forks source link

codellama-7b working on GPUs with 24GB memory #9

Closed wangpatrick57 closed 6 months ago

wangpatrick57 commented 6 months ago

Previously, codellama-7b did not run on AWS g6.2xlarge machines. I made a small change in rest_test.py to match how the logic was done in gen_model_answer_rest.py, which resolved this memory issue.