Closed xloem closed 1 year ago
Do you know which line the out-of-memory happens at?
It happens while loading shards from disk inside Model.from_pretrained() on https://github.com/FMInference/FlexGen/blob/0342e2a0e93593b2c11f84be0e9f5d5bcb73e598/flexgen/opt_config.py#L146 .
I'm also encountering these errors. @xloem were you able to modify the code to get it to work?
I modified that function to pass the kwparams I mentioned, and then also called it manually before anything else was constructed, so that more ram was free, and then got farther, but encountered a later crash that I haven't looked into yet.
EDIT: it looks like the second crash is because the policy needs changing for the model and system i'm using, and the addition of the kwargs does move by this issue. i personally also added code to wipe the transformers.utils.TRANSFORMERS_CACHE
after the initial download if not enough disk space remained available.
In case anyone else finds this thread and are in a similar situation to me (with the opt-13b model, using a 1080Ti with 11GB VRAM + 32GB CPU RAM, with a 2GB swap file, but unlike OP, able to enlarge it) - try enlarging your swapfile. I created a 16GB swapfile and now it works.
I’m observing this issue was closed without change or explanation and am guessing maybe it is out of scope for now or would need the changes introduced as a PR.
I’m observing this issue was closed without change or explanation and am guessing maybe it is out of scope for now or would need the changes introduced as a PR.
Sorry that I misread the thread and thought the problem had been resolved. I reopened it and will do it soon.
@xloem This should be fixed by #69. It is merged into the main branch. Could you try it now?
By inspection it looks like you’ve resolved the issue. I might delete the .bin file after conversion to save disk space, maybe you are and i missed it.
I tried to pull the changes but it looks like there’s been a force push and the new tip doesn’t merge with my old checkout. My test code doesn’t quite run yet against the new codebase but I’ll keep in mind you fixed this.
Thank you.
I’m on a system hardlimited to 40GB of cpu ram + swap.
When I try to load opt-30b the process is killed from memory exhaustion. If I load the model manually using
device_map="auto", offload_folder=“offload”
, the load succeeds.Is there a way to pass these flags manually or otherwise accommodate ram limits during initial loading?