FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

Doesn't seem to obey --path argument, instead try to download to .cache again #20

Open hsaito opened 1 year ago

hsaito commented 1 year ago

On Windows at least, it seems to be path is not obeyed and kept downloading into .cache directory of the c:\ file system (which I don't have enough space.)

I've manually downloaded required files into other drive, where the relative path is ../models--facebook--opt-30b

Executing python -m flexgen.flex_opt --model facebook/opt-30b --path ../models--facebook--opt-30b still causes it to download the item into aforementioned .cache directory -- tried other variant python -m flexgen.flex_opt --model facebook/opt-30b --path ../models--facebook--opt-30b/snapshots/ceea0a90ac0f6fae7c2c34bcb40477438c152546.

Am I misunderstanding the way --path work or there's something wrong with it?

Also it would be nice to have the option to inhibit all the automatic download and just stop with error as I do not want to exhaust disk space that's already precious...

Ying1123 commented 1 year ago

Before the argument --path works, FlexGen will download weights by using huggingface/transformers, which will use .cache by default. Then FlexGen converts the huggingface format to its own format.

The weight download function is here. https://github.com/Ying1123/FlexGen-dev/blob/876fff7d33caecca5c972efcf5028fe6250ceeae/flexgen/flex_opt.py#L641-L642 If you are familiar with how huggingface/transformers work, you can modify it for your needs.

freedmand commented 1 year ago

You can modify where the transformers module places cached files by setting the environment variable TRANSFORMERS_CACHE. See here for additional information: https://stackoverflow.com/questions/63312859/how-to-change-huggingface-transformers-default-cache-directory

Yang-HangWA commented 1 year ago

@Ying1123 Hi, I have the same confusion when I downloaded the model of facebook/opt-30b from huggingface, and the model files can't be recognized by transformers==4.25.1, so is there a way to use local weights?
image

image

Vinkle-hzt commented 1 year ago

@hsaito PR #111 allows to use --local argument to load manually downloaded model, you can have a try