Suggest to integrate sam-hq (also light-hq-sam)

ckcraig01 commented 1 year ago

Dear Author,

As I understand, maybe you are using this very repo. for the onnx export of SAM series.

But now, as SAMand SAM-HQ (and Light-HQ-SAM in the same repo.) both support onnx export.

Maybe not urgent, but I hope that you may consider someday add in these features.

The reason is that grounded-SAM also claim significant improvement from SAM to the adoption of HQ-SAM (and Light-HQ-SAM)

Thanks again for your great work.

CVHub520 commented 1 year ago

Hi, @ckcraig01, thanks for your valuable suggestion. I appreciate your feedback, and i do plan to gradually support the export features for SAM-HQ and Light-HQ-SAM in the future.

It's important to note that X-AnyLabeling currently integrates SAM and Mobile-SAM from this repository, while other models are part of separate deployment tutorials for different projects. For more information on using custom models, you can refer to this documentation.

I also welcome contributions from the community, so if you're interested in contributing or have specific features in mind, please feel free to submit a PR. I'm here to assist and support your efforts.

Thanks again for your support and feedback!

ckcraig01 commented 1 year ago

Dear Author,

Great to hear about this, Not sure if the sam-exporter could also have a handy support for the hq-sam/lite-hq-sam. Just hope you may give it a quick try to see if it work. We might try it out later, but we are focusing on yolov8pose related feedback.

As we have feedback to CVAT for their layer design, several preview comment FYI (maybe we will create a new issue later on): (1) annotator could set the detection bbox to the bottom layer so annotator could adjust keypoints

or a hidden bottom for hiding the bbox or keypoints (2) skeleton design (maybe difficult to achieve with current UI?), for better identifying which kpts correspond to which person (3) clicking kpt in image, "对象" on the right panel will move to the corresponding position

By the way, as you have Grounded DINO and SAM/MobileSAM, how about also enabling the grounding-SAM ability to adopt the word prompt => bbox + segmentation mask functionality. That would also be great.

Sorry for bringing to much information here, we are truly appreciate your contribution to the community and hope all the best.

CVHub520 commented 1 year ago

Hi, @ckcraig01, thank you for your comprehensive feedback and suggestions! Regarding the HQ-SAM (only support the decoder part export), I've implemented an exporter script for the encoder and decoder part of HQ-SAM. You can access it here if needed.

Now, addressing your points:

(1) annotator could set the detection bbox to the bottom layer so annotator could adjust keypoints or a hidden bottom for hiding the bbox or keypoints

You can show/hide the detection bounding box or keypoints in the right panel's Objects by toggling the checkbox corresponding to the target you are interested in.

(2) skeleton design (maybe difficult to achieve with current UI?), for better identifying which kpts correspond to which person

The skeleton mode is currently based on the point mode, mainly used for tasks like lane detection. While this specific functionality is under consideration, you can leverage the checkbox and Group features to assist in your tasks.

(3) clicking kpt in image, "对象" on the right panel will move to the corresponding position

The functionality of clicking on a keypoint in the image and having the "Objects" panel move to the corresponding position is supported. Please make sure you are in edit mode to use this feature.

Lastly, I appreciate your suggestion about enabling Grounding-SAM to adopt the word prompt => bbox + segmentation mask functionality. It's a great idea, and now, you can feel free to try out the latest version, X-AnyLabeling v2.0.0, where I'm integrated more features, including Grounding-SAM. Your suggestiongs and feedback are highly valued, and I hope you enjoy using the tool!

ckcraig01 commented 1 year ago

Sorry for the late reply, was seeking to experience more on v2.0 and provide more informative feedback from my side. I would like to say thanks for your promptly feedback.

Hi, @ckcraig01, thank you for your comprehensive feedback and suggestions! Regarding the HQ-SAM (only support the decoder part export), I've implemented an exporter script for the encoder and decoder part of HQ-SAM. You can access it here if needed.

The upgrade to HQ-SAM is indeed impressive, Perhaps you may consider to also integrate the Light HQ-SAM in the same repo.:
- _vittiny (Light HQ-SAM for real-time need)
I will continuous to investigate onmore SOTAs (such as semantic SAM and more) and will report to you if I find anything worthy of integration

Now, addressing your points:

(1) annotator could set the detection bbox to the bottom layer so annotator could adjust keypoints or a hidden bottom for hiding the bbox or keypoints

You can show/hide the detection bounding box or keypoints in the right panel's Objects by toggling the checkbox corresponding to the target you are interested in.

Yes, you are right. Maybe we could consider a checkbox to hide bbox/kpts at once. could reference to other UI.

(2) skeleton design (maybe difficult to achieve with current UI?), for better identifying which kpts correspond to which person

The skeleton mode is currently based on the point mode, mainly used for tasks like lane detection. While this specific functionality is under consideration, you can leverage the checkbox and Group features to assist in your tasks.

I found currently bboxes seem not to have the group_id in the yolov8pose inference? Am I correct?

(3) clicking kpt in image, "对象" on the right panel will move to the corresponding position

The functionality of clicking on a keypoint in the image and having the "Objects" panel move to the corresponding position is supported. Please make sure you are in edit mode to use this feature.

I found it work of bbox, but not for kpts. Please help to check. Thanks.

Lastly, I appreciate your suggestion about enabling Grounding-SAM to adopt the word prompt => bbox + segmentation mask functionality. It's a great idea, and now, you can feel free to try out the latest version, X-AnyLabeling v2.0.0, where I'm integrated more features, including Grounding-SAM. Your suggestiongs and feedback are highly valued, and I hope you enjoy using the tool!

This feature is also very impressive. Not sure if we could allow some mechanism for choosing the detector and SAM model combinations
- For example, grounded-dino B/T
- SAM-HQ B/L/H (Light)
One more thing I would like to mention is that as the model zoo grows, we may like to consider a better UI to differentiate different types （applications） of auto-labeling models. I guess you have already thinking of the strategies, just want to say I would love it. Many thanks.

CVHub520 commented 11 months ago

Hi, @ckcraig01, I recommened u to try use the real-time model application for EdgeSAM, which is an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance.

It achieves a 40-fold speed increase compared to the original SAM, and outperforms MobileSAM, being 14 times as fast when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3 and 3.2 respectively. EdgeSAM is also the first SAM variant that can run at over 30 FPS on an iPhone 14.

CVHub520 / X-AnyLabeling

Suggest to integrate sam-hq (also light-hq-sam) #110