Open AlexCheema opened 3 weeks ago
Currently you can only run one exo instance on each device.
There are some design decisions here:
I agree with supporting one exo instance that uses multiple GPUs. This approach would allow us to shard more when only one model is inferencing. What do you think?
Currently you can only run one exo instance on each device.
There are some design decisions here: