Open petergerten opened 1 week ago
Hi @petergerten , the main issue is we don't have a card limit in the code on the amount of characters you can generate, so the 20GB VRAM is a rather crude, rough, upper bound. For short pieces of text and references, you can probably get away with much less. We'll get to proper VRAM testing with different inference lengths and update the readme shortly. Hope that helps!
Hi,
can somebody explain how this requires ~20GB VRAM? For 750M+450M params that seems very strange. The readme indicates that there is room for optimization but I would like to understand what the main problem is.