Closed ckcraig01 closed 11 months ago
Hi, @ckcraig01, thanks for your valuable suggestion. I appreciate your feedback, and i do plan to gradually support the export features for SAM-HQ and Light-HQ-SAM in the future.
It's important to note that X-AnyLabeling currently integrates SAM and Mobile-SAM from this repository, while other models are part of separate deployment tutorials for different projects. For more information on using custom models, you can refer to this documentation.
I also welcome contributions from the community, so if you're interested in contributing or have specific features in mind, please feel free to submit a PR. I'm here to assist and support your efforts.
Thanks again for your support and feedback!
Dear Author,
Great to hear about this, Not sure if the sam-exporter could also have a handy support for the hq-sam/lite-hq-sam. Just hope you may give it a quick try to see if it work. We might try it out later, but we are focusing on yolov8pose related feedback.
As we have feedback to CVAT for their layer design, several preview comment FYI (maybe we will create a new issue later on): (1) annotator could set the detection bbox to the bottom layer so annotator could adjust keypoints
By the way, as you have Grounded DINO and SAM/MobileSAM, how about also enabling the grounding-SAM ability to adopt the word prompt => bbox + segmentation mask functionality. That would also be great.
Sorry for bringing to much information here, we are truly appreciate your contribution to the community and hope all the best.
Hi, @ckcraig01, thank you for your comprehensive feedback and suggestions! Regarding the HQ-SAM (only support the decoder part export), I've implemented an exporter script for the encoder
and decoder
part of HQ-SAM. You can access it here if needed.
Now, addressing your points:
(1) annotator could set the detection bbox to the bottom layer so annotator could adjust keypoints or a hidden bottom for hiding the bbox or keypoints
- You can show/hide the detection bounding box or keypoints in the right panel's
Objects
by toggling the checkbox corresponding to the target you are interested in.(2) skeleton design (maybe difficult to achieve with current UI?), for better identifying which kpts correspond to which person
- The skeleton mode is currently based on the
point
mode, mainly used for tasks like lane detection. While this specific functionality is under consideration, you can leverage the checkbox andGroup
features to assist in your tasks.(3) clicking kpt in image, "对象" on the right panel will move to the corresponding position
- The functionality of clicking on a keypoint in the image and having the "Objects" panel move to the corresponding position is supported. Please make sure you are in edit mode to use this feature.
Lastly, I appreciate your suggestion about enabling Grounding-SAM to adopt the word prompt => bbox + segmentation mask functionality. It's a great idea, and now, you can feel free to try out the latest version, X-AnyLabeling v2.0.0, where I'm integrated more features, including Grounding-SAM. Your suggestiongs and feedback are highly valued, and I hope you enjoy using the tool!
Sorry for the late reply, was seeking to experience more on v2.0 and provide more informative feedback from my side. I would like to say thanks for your promptly feedback.
Hi, @ckcraig01, thank you for your comprehensive feedback and suggestions! Regarding the HQ-SAM (only support the decoder part export), I've implemented an exporter script for the encoder and decoder part of HQ-SAM. You can access it here if needed.
The upgrade to HQ-SAM is indeed impressive, Perhaps you may consider to also integrate the Light HQ-SAM in the same repo.:
Now, addressing your points:
(1) annotator could set the detection bbox to the bottom layer so annotator could adjust keypoints or a hidden bottom for hiding the bbox or keypoints
You can show/hide the detection bounding box or keypoints in the right panel's Objects by toggling the checkbox corresponding to the target you are interested in.
(2) skeleton design (maybe difficult to achieve with current UI?), for better identifying which kpts correspond to which person
The skeleton mode is currently based on the point mode, mainly used for tasks like lane detection. While this specific functionality is under consideration, you can leverage the checkbox and Group features to assist in your tasks.
(3) clicking kpt in image, "对象" on the right panel will move to the corresponding position
The functionality of clicking on a keypoint in the image and having the "Objects" panel move to the corresponding position is supported. Please make sure you are in edit mode to use this feature.
Lastly, I appreciate your suggestion about enabling Grounding-SAM to adopt the word prompt => bbox + segmentation mask functionality. It's a great idea, and now, you can feel free to try out the latest version, X-AnyLabeling v2.0.0, where I'm integrated more features, including Grounding-SAM. Your suggestiongs and feedback are highly valued, and I hope you enjoy using the tool!
Hi, @ckcraig01, I recommened u to try use the real-time model application for EdgeSAM, which is an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance.
It achieves a 40-fold speed increase compared to the original SAM, and outperforms MobileSAM, being 14 times as fast when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3 and 3.2 respectively. EdgeSAM is also the first SAM variant that can run at over 30 FPS on an iPhone 14.
Dear Author,
As I understand, maybe you are using this very repo. for the onnx export of SAM series.
But now, as SAMand SAM-HQ (and Light-HQ-SAM in the same repo.) both support onnx export.
Maybe not urgent, but I hope that you may consider someday add in these features.
The reason is that grounded-SAM also claim significant improvement from SAM to the adoption of HQ-SAM (and Light-HQ-SAM)
Thanks again for your great work.