baaivision / tokenize-anything

[ECCV 2024] Tokenize Anything via Prompting
Apache License 2.0
537 stars 21 forks source link

Question on Heuristic Routing Strategy #14

Open Jianqiuer opened 8 months ago

Jianqiuer commented 8 months ago

I've been exploring the implementation of the heuristic routing strategy within your project and came across a specific operation that piqued my curiosity. Specifically, I noticed that the strategy doesn't directly utilize the first bounding box (IOU score index 0) prediction result for routing decisions. Instead, there seems to be an operation where the initial mask prediction result is modified by subtracting 1000 from it.

Could you please clarify the underlying principle behind this approach? I'm particularly interested in understanding:

I believe understanding this could greatly enhance my comprehension of the heuristic routing strategy's design and its implications on the system's overall performance.

Looking forward to your insights.

Thank you for your time and consideration.

PhyscalX commented 7 months ago

@Jianqiuer

  1. We refer the SAM's conclusion, that box prediction is not ambiguous. As a result, we always select the first mask token for box.

  2. Score subtraction is a simple vectorized implementation for SAM's and ours routing strategy. We refer the ONNX wrapper code for SAM. [Code].

  3. We use a routing strategy slightly different from SAM's implementation. We rethink the ambiguity issue for K-points prompt. Typically, estimating an accurate "K" is non-trivial for both training and evaluating phases. For simplicity, we always select the top-ranked mask token for points.