CVHub520 / X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.
GNU General Public License v3.0
4.14k stars 473 forks source link

请问sam系列的onnx模型是否支持automatic推理模式? #709

Open Ultraman6 opened 4 days ago

Ultraman6 commented 4 days ago

Search before asking

Question

我发现一个问题,不知道是不是官方的导出格式即是如此,使用onnx格式的decoder,输出的masks的批量数与深度均为1,而官网原版的动态模型,默认是64与3,解析了onnx格式的decoder发现原本动态的num_multimask_outputs与transformer_dim在里面全部写死,使得onnx格式decoder输出只能为(1,1,x,y)维度的masks,请问有没有办法解决? 使得其能够支持auto模式? image image image

Additional

No response

CVHub520 commented 4 days ago

Hi there! Regarding your question about automatic inference mode support for SAM's ONNX models:

Yes, it's possible to support automatic inference, but you'll need to handle the tensor transformations correctly. The key is properly managing the mask dimensions between the decoder output and the network's requirements.

I recommend checking out the related samexporter project, e.g., samexporter as a reference. You can modify its export functionality to match your specific needs. The implementation in X-AnyLabeling's segment_anything_2.py can then be adapted accordingly.

For your specific issue with the masks dimensions (1,1,x,y), you'll need to:

  1. Process the decoder output to match the expected num_multimask_outputs transformer dimension
  2. Adjust the tensor reshaping operations to maintain compatibility with the model's expected input format
  3. Handle the mask transformations appropriately in your inference pipeline

Let me know if you need any clarification on implementing these changes!