Deploying the encoder service

geohacker commented 1 year ago

Had a quick chat with @ingalls and @batpad. We'll use a basic ECS cluster with 2 autoscaling groups:

1 for CPU that's always-on
another one that's GPU backed but by request
For the first pass we'll have a manual process to increase the min size of the autoscaling group
Later this can be managed via potentially a simple API gateway endpoint that modifies the min size on the cluster

@ingalls — @rbavery has done pretty much all the work in terms of dockerizing these in this repo. gpu and cpu. So you should be able to pretty much take that and setup ECS.

I think we could combine this to one image if you need for simplicity. @rbavery would that be ok or are there other considerations?

cc @srmsoumya

rbavery commented 1 year ago

Talked with @geohacker but reiterating here, I separated the GPU encoder service thinking that would be run in an on-demand fashion to keep cost down. Whenever new imagery becomes available, we'd spin up the GPU encoder and then spin it down once we have the embeddings.

The CPU decoder would get used more frequently and consistently on top of the embeddings that were generated, so I was thinking it should be it's separate service that is always running in the backend for something like PEARL.

If we want to bundle the services to have both on the CPU that's easy enough to do by including two mar archives when spinning up the Torchserve containers. But I think we'd want to make sure that the latency of the CPU encoder satisfies user requirements, I've seen anywhere from 10-50 seconds per image encoding depending on image size and what machine I've run the encoding on.

rbavery commented 1 year ago

This is deployed, closing!

developmentseed / segment-anything-services

Deploying the encoder service #9