SysCV / sam-hq

Segment Anything in High Quality [NeurIPS 2023]
https://arxiv.org/abs/2306.01567
Apache License 2.0
3.73k stars 224 forks source link

Can I have a more specific using example (code snippet) in README.md? #119

Open stevezkw1998 opened 10 months ago

stevezkw1998 commented 10 months ago

I want to get start with the step: https://github.com/SysCV/sam-hq?tab=readme-ov-file#getting-started

from segment_anything import SamPredictor, sam_model_registry
sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
predictor = SamPredictor(sam)
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>)

But the ambiguous parameters like "" "<path/to/checkpoint>" , ... are not specific enough for me to understand how to use this tool,

For example, the parameter should stand for absolute path? the image object? or multi-binary array?

I would appreciate it if you could provide a specific use example for HQ-SAM inference if possible thanks!

lkeab commented 10 months ago

hi, we provided the demo file here for you to refer. python demo/demo_hqsam.py.

You can also refer to the colab notebook here.

stevezkw1998 commented 10 months ago

Collaborator

Got it, thank you for your help, I would go through it, From now on, in my humble view, maybe only sam_checkpoint is necessary enough to load the model, because it can tell the model_type from sam_checkpoint and less input would let your tool even easier to use :)

stevezkw1998 commented 10 months ago

I noticed there is one line:

device = "cuda"
sam.to(device=device)

I am not sure what device I should pass, if I run this as Docker Image in Kubeflow pipeline ? (Kubeflow will pre-allocated one GPU for it.) I would be very grateful if someone could provide some suggestions, thanks.

stevezkw1998 commented 10 months ago

and in the line

input_box = np.array([[4,13,1007,1023]])
input_point, input_label = None, None

the bbox is in [w,h,x,y] format or [x1,y1,x2,y2] format?

stevezkw1998 commented 10 months ago

And I have another question:

image = cv2.imread('demo/input_imgs/example0.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
input_box = np.array([[4,13,1007,1023]])
input_point, input_label = None, None
predictor.set_image(image)
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box = input_box,
    multimask_output=False,
    hq_token_only= False,
)

If I input a bbox which just cover the whole image, and let multimask_output=True will I get multi-object results after inferencing?