GAIR-NLP / anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
https://huggingface.co/spaces/ethanchern/Anole
618 stars 33 forks source link

can you please add the option to runn it on a cpu? #13

Open Manni1000 opened 1 month ago

Manni1000 commented 1 month ago

can you add the option to runn it on a cpu?

Manni1000 commented 1 month ago

or maybe a lower precision (Quantization) one so people can run it with a lower amount of VRAM like 24 GB maybe?

EthanC111 commented 1 month ago

Thanks a lot for your interest! We will add this to our TODO list!

trygvebw commented 1 month ago

For whatever reason, changing the map_location (an argument to torch.load) inside of _convert(...) in loader.py from cuda to cpu made it work on my 24 GB GPU... and the weights seemed to have ended up on GPU anyway?

That is, the following prints cuda:0 for every parameter even with map_location set to cpu:

for _, param in model.named_parameters():
    print(param.device)

I assume there's an implicit .to('cuda') call somewhere in the code, but this still raises the question that if loading the weights to CPU then moving them to GPU works, why do I run out of memory if I try to load them directly to GPU (i.e. with map_location set to cuda)? Does the torch.load call load additional weights that are not actually used?

Manni1000 commented 1 month ago

i tried that but for me it did not work. and yes in other files stuff gets loaded to the gpu.