batch processing/parallel processing

bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

https://petals.dev

MIT License

9.26k stars 526 forks source link

batch processing/parallel processing #585

Open oldcpple opened 5 months ago

oldcpple commented 5 months ago

Hi there, does Petals currenly support batch processing/parallel processing? For example, to increase resource usage or system throughput, we would like to see servers parallelly processing multiple prompts at the same time, aka batch processing. Is this possible? Thanks a lot.

justheuristic commented 4 months ago

Hi! Both forward/backward and autoregressive inference can run with any batch size, provided that you have enough memory for that.

In our training examples, we use batched training, e.g. this one https://github.com/bigscience-workshop/petals/blob/main/examples/prompt-tuning-sst2.ipynb as a batch size of 32