Open tgolsson opened 3 weeks ago
Would be interesting to see how far we can scale cervo serve or other inf servers perf-wise, to avoid needing GPU for distributed serving of small models.
Would be interesting to see how far we can scale cervo serve or other inf servers perf-wise, to avoid needing GPU for distributed serving of small models.