ali-vilab / AnyDoor

Official implementations for paper: Anydoor: zero-shot object-level image customization
https://ali-vilab.github.io/AnyDoor-Page/
MIT License
3.99k stars 365 forks source link

Enhancement of Object Segmentation Precision for Improved Image Customisation #39

Open yihong1120 opened 10 months ago

yihong1120 commented 10 months ago

Dear AnyDoor Contributors,

I hope this message finds you well. I am writing to address a potential area for enhancement within the AnyDoor project, specifically pertaining to the precision of object segmentation during the image customisation process.

Upon utilising the AnyDoor model for various image editing tasks, I have observed that while the overall performance is commendable, there are instances where the object segmentation module could benefit from increased accuracy. This is particularly evident in images with intricate backgrounds or when objects have complex edges.

The current segmentation approach, although effective for a broad range of scenarios, occasionally struggles with fine details, leading to a less than optimal customisation outcome. This is especially true for tasks that demand high fidelity, such as virtual try-on applications where precise garment edges are crucial.

To illustrate, I have attached a series of images (see attachments) that demonstrate the challenges faced with the current segmentation algorithm. In these examples, you will notice that the segmentation mask does not fully capture the nuanced contours of the objects, resulting in a slight misalignment in the customised images.

I propose the exploration of advanced segmentation techniques, such as those employing deep learning architectures specifically designed for edge detection, or the integration of interactive segmentation tools that allow for user refinement. The latter could be particularly beneficial in providing end-users with the ability to make fine-grained adjustments, thereby enhancing the overall quality of the customisation.

I believe that addressing this aspect could significantly elevate the user experience and expand the practical applications of the AnyDoor model. I would be keen to hear your thoughts on this matter and discuss potential collaborative efforts to develop and integrate such improvements.

Thank you for your time and consideration. I look forward to your response and am excited about the prospect of contributing to the advancement of the AnyDoor project.

Best regards, yihong1120

XavierCHEN34 commented 10 months ago

Hi yihong,

Thank you for your suggestions!

I totally agree with you that segmenting the reference object with high quality is an essential step for AnyDoor. I have updated a mask refinement module which could somehow improve the segmentation quality.

Previously, I did some research for interactive segmentation like in https://github.com/XavierCHEN34/ClickSEG . I'm also willing to integrate them into the AnyDoor system. But, maybe it requires some effort for integrating them into Gradio. đŸ˜‚ So, currently we put higher priorities on improving the quality of AnyDoor and release some specific versions for downstream tasks.

Best

alexandrewillame commented 10 months ago

Being able to use Meta SAM (Segment Anything) would be amazing!

https://github.com/facebookresearch/segment-anything

luccachiang commented 8 months ago

Did not notice this repo changed so much. Thanks for your effort. @XavierCHEN34 I have one question, though. Does the mask refinement module only work in Gradio demo?