bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
9.23k stars 523 forks source link

Running bloom-chat on petals #318

Open barsuna opened 1 year ago

barsuna commented 1 year ago

hey folks, thank you for developing this amazing project!

can one run Petals with Bloomchat model? https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1

does one need to convert it? can this be done locally if i have the weights?

borzunov commented 1 year ago

Hi @barsuna,

Thanks for the kind words!

As far as I understood, this model has the same architecture as the original BLOOM and only the weights are different. If it's the case, integrating it should be easy.

You do need to convert the model (see the instructions), it can be done locally but you need a machine with 350 GB RAM (the conversion script is not optimal right now, and it needs to load the whole model into RAM).

Would you like to try doing this yourself? Or, if you are lost or meet any issues, I can try doing it for you.

barsuna commented 1 year ago

Thank you for quick reply! I have tried running the script locally - this was where i stumbled, it said it needs git LFS etc - it seems it would try to check in the converted model or something... while i tried running it just specifying the model and output paths...

python -m petals.cli.convert_model --model ../bloom_chat --output_path ../bloom_chat_petals

... there is no issue with RAM, but there is issue with connectivity to the internet (it is lab machine)

borzunov commented 1 year ago

@barsuna, I'll launch the conversion procedure on one of our machines. It also has limited bandwidth, so this may take a couple of days.

barsuna commented 1 year ago

@borzunov, thank you, really appreciated!

for what its worth, i have tried to update the convert_model.py script to work locally and it produced a bunch of files in the target directory

$ ls ../bloom_ch_p/ block_0 block_12 block_16 block_2 block_23 block_27 block_30 block_34 block_38 block_41 block_45 block_49 block_52 block_56 block_6 block_63 block_67 block_8 block_1 block_13 block_17 block_20 block_24 block_28 block_31 block_35 block_39 block_42 block_46 block_5 block_53 block_57 block_60 block_64 block_68 block_9 block_10 block_14 block_18 block_21 block_25 block_29 block_32 block_36 block_4 block_43 block_47 block_50 block_54 block_58 block_61 block_65 block_69 main block_11 block_15 block_19 block_22 block_26 block_3 block_33 block_37 block_40 block_44 block_48 block_51 block_55 block_59 block_62 block_66 block_7 $ ls ../bloom_ch_p/block_0 -l total 4817268 -rw-rw-r-- 1 barsuna barsuna 4932877985 May 25 16:54 pytorch_model.bin

$ ls ../bloom_ch_p/main/ -l total 7038936 -rw-rw-r-- 1 barsuna barsuna 590 May 25 17:07 config.json -rw-rw-r-- 1 barsuna barsuna 7193348019 May 25 17:07 pytorch_model.bin -rw-rw-r-- 1 barsuna barsuna 96 May 25 17:07 special_tokens_map.json -rw-rw-r-- 1 barsuna barsuna 313 May 25 17:07 tokenizer_config.json -rw-rw-r-- 1 barsuna barsuna 14500443 May 25 17:07 tokenizer.json

but I'm not sure how to use these - from what it appears the Petals seems to pick up model from .cache/petals/... and the files are laid out differently accross blobs, refs directories - guess this mapping is done by the git, so probably i should stop and wait for the converted model :)