Closed 8188 closed 8 months ago
Stale issue message
I'm interested in the same question. How do we feed in large schemas (I'm talking > 10k) without running out of memory on inference?
This issue is stale because it has been open for 30 days with no activity.
Same question here.
try
load_in_4bit=True & num_beams=3
it works for me with 4090(24G)
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
I use 7b-model with 3090(24G), When loading, it cost 15G Memory, It's OK. When referencing, indeed it is a nice model, but it eat too much Memory as metadata.sql increasing a bit. Only a 4k metadata.sql probably causes CUDA out of memory. Any good idea to solve it?