Support customizable prompts, multiple points and bbox for decoder

rbavery commented 1 year ago

with the current service, you can only supply one point. But SAM supports supplying N points, or a bbox, which should greatly improve mask detection accuracy.

We can support more prompts by adapting handler_decode.py to accept a different payload and pass the prompt inputs more flexibly to ort.InferenceSession. This would involve

changing the structure of the expected payload. currently handler_decode.py expects

{
    "image_embeddings": encoded_embedding_string,
    "image_shape": img_shape,
    "input_label": input_label, # needs to be updated and made more flexible
    "input_point": input_point  # needs to be updated and made more flexible
}

Changing how we pass this information to decoding inference. I think we might just be able to pass the same input shapes for two parameters, since the empty dimension I'm adding in both cases probably represents the potential for multiple points.
Handle a bbox input, not sure how to set this up yet but can look into it when it becomes a priority. I think handling N points is higher priority since it's easier on the user.

wouellette commented 1 year ago

HI @rbavery,

Any updates on this one? If no work has been done on DevSeed side, I think I can work on an implementation before end of the year.

Just wanted to check in first to see if any work had been done on your side.

rbavery commented 1 year ago

Hi @wouellette, no progress from my end, I've been caught up with other things. It looks like @Rub21 has made some progress in #25 but I'm sure help would be appreciated! Happy to review any PRs.

rbavery commented 1 year ago

@wouellette nevermind, we've picked this back up: https://github.com/developmentseed/segment-anything-services/pull/32

and @Rub21 has set up some great demo functionality of the multi foreground point decoder on some prepared embeddings, see https://devseed.com/ds-annotate?project=brussels

Rub21 commented 1 year ago

Actually, I still need to do some work on the fronted , to support background and foreground at same time, for features the requires, the backend already support that.

https://github.com/developmentseed/segment-anything-services/assets/1152236/457b6ded-516f-4542-8d20-4e5bdf3b83f9

wouellette commented 1 year ago

Ok so if I understand correctly, the handler handles all prompt types through the decode_single_point method: https://github.com/developmentseed/segment-anything-services/blob/8b1d69970ec6afb404f76e669dc913883e077595/handler_decode.py#L25-L31

rbavery commented 12 months ago

@wouellette not quite yet, but we're going to add separate decode functions for different payload types.

developmentseed / segment-anything-services

Support customizable prompts, multiple points and bbox for decoder #7