567-labs / fastllm

A collection of LLM services you can self host via docker or modal labs to support your applications development
MIT License
182 stars 23 forks source link

Use larger batches for embedding example #29

Closed aksh-at closed 10 months ago

aksh-at commented 10 months ago

Sending larger batches to each Modal function instead of relying on allow_concurrent_inputs. A bit annoying to do this, but this allows us to embed 10% of wikipedia in 5 minutes, using 30-50 GPUs:

image