TencentARC / SmartEdit

Official code of SmartEdit [CVPR-2024 Highlight]
227 stars 6 forks source link

The correspondence between the masks and instructions #12

Closed lzw-lzw closed 4 months ago

lzw-lzw commented 4 months ago

Thank you for your work. I would like to know how to associate the masks with the instructions. For example, under the directory './dataset/OriginalImg/Mirror-Animal/0000/mask', there are two mask images: '0000_dog(0.60).png' and '0000_dog(0.73).png'. The corresponding instructions include:

"Change the right cat to a marmot" "Replace the right cat with a marmot" "Change the right animal to a marmot" "Replace the right animal with a marmot" "Change the left cat to a wolf", "Replace the left cat with a wolf", "Change the left animal to a wolf", "Replace the left animal with a wolf"

Apart from manually inspecting the mask images, how can I know which mask corresponds to the left cat and which one corresponds to the right cat?

yuzhou914 commented 4 months ago

Hi, the mask in syntactic training datasets are only used to synthesize new images, and no other use.

lzw-lzw commented 4 months ago

I understand that, but I would like to use your dataset in other projects. Is there any way to link the masks and instructions together? For example, when generating synthetic data, do you save which mask image corresponds to each object mentioned in the instruction? Thank you.

LiangbinXie commented 4 months ago

@lzw-lzw For the complicated cases, like two cats in the image. We first adopt the GroundingSAM to segment these two cats and obtain the two binary masks. Then we will manually label each mask (the left cat, the right cat).

lzw-lzw commented 4 months ago
image

Thanks for your patient response, what I mean is how to differentiate between the two masks in the above image, i.e., whether "cat(0.81).png" corresponds to the left cat or the right cat. Is there any way to do this other than manually viewing the original image and masks? Do you have any saved correspondence information between them, such as "the left cat" corresponds to the "cat(0.81).png" mask? Thanks again for your patience.

LiangbinXie commented 4 months ago

I generate the binary map based on Grounding SAM. Initially, I intended to use the “left cat” and "right cat" as the text prompt for the model to locate itself automatically and generate the binary map. However, I found that Grounding SAM does not have this capability. Therefore, I chose to manually label it. I think may you can integrate the LLaVA into the Grounding SAM pipeline.

lzw-lzw commented 4 months ago

Hi, I understand your data construction process. However, I want to use the data you have already built for another project, where I need the data pairs of "text prompt - mask". But I don't know which object each mask corresponds to. For example, in the image I provided above, I would like to know if "cat(0.81).png" corresponds to the cat on the left or the cat on the right. I'm wondering if you have saved the correspondence between the text prompts and mask files, so that I can directly use them. Thanks.

LiangbinXie commented 4 months ago

I'm sorry, I've already cleaned up the intermediate results. I just remembered, you can use LISA to generate binary masks, which is quite accurate in most cases (such as left cat, right cat).

lzw-lzw commented 4 months ago

Thank you very much. I will have a try.