Motivation: Batching multiple inference requests together can speed up inference. Batching can even be leveraged with single-input settings for speedups with e.g. staged speculative decoding.
What: Currently, exo handles inference requests separately. This bounty is for batching inferences together, so that multiple inputs can be passed through model shards together in a single pass.
Reward: $200 Bounty paid out with USDC on Ethereum, email alex@exolabs.net
Motivation: Batching multiple inference requests together can speed up inference. Batching can even be leveraged with single-input settings for speedups with e.g. staged speculative decoding.
What: Currently, exo handles inference requests separately. This bounty is for batching inferences together, so that multiple inputs can be passed through model shards together in a single pass.
Reward: $200 Bounty paid out with USDC on Ethereum, email alex@exolabs.net