Reagan1311 / OOAL

One-Shot Open Affordance Learning with Foundation Models (CVPR 2024)
MIT License
16 stars 1 forks source link

Could you please provide the UMD dataset used for reproduction? #3

Open hiwmy0211 opened 1 month ago

hiwmy0211 commented 1 month ago

hello, thank you for your excellent work again! May I ask if you could provide the UMD dataset used for training and testing in the network? After downloading the original UMD dataset, I found that it cannot be directly fed into the network for reproduction. Thank you in advance for your reply!

Reagan1311 commented 1 month ago

Hi, here is the UMD one-shot data we used for training: link. Following AffCorrs, we have changed the affordance of Ladle from "Contain" to "Scoop". For training and testing, we used mmsegmentation to conduct the experiments.

hiwmy0211 commented 1 month ago

1 Thank you again for your response! However, the labels downloaded from this link are all black. Is there an issue with the link?

Reagan1311 commented 1 month ago

Hi, this is because the value in the label image is from 0~7 (background + 7 affordances). You can assign each value with different RGB colors to visualize it.

hiwmy0211 commented 4 weeks ago

Thank you again for your response. I now have a clearer understanding of the training and testing settings under the UMD dataset for both Seen and Unseen configurations. My understanding is that the test sets for both Seen and Unseen configurations in the UMD dataset are the same, consisting of all the data provided by the official website. Could you please confirm if this understanding is correct? Additionally, could you provide the network deployed on the UMD dataset? I noticed that the evaluation metrics used are different from those on the AGD20K dataset, so I would like to ask if it would be possible for you to share the code used for running the network on the UMD dataset. Thank you in advance for your reply!

Reagan1311 commented 3 weeks ago

We use the official test sets from UMD dataset, but they are different for seen and unseen settings. After you downloading the dataset, you will find the category_split file is for seen setting, and the novel_split file is for unseen setting. The network is the same for the UMD experiments, you can try to implement it using the MMSegmentation toolbox. The code I previously used was stored on a server, but I no longer have access to it.

hiwmy0211 commented 3 weeks ago

Thank you very much for your response. I will deploy the network running on the UMD dataset based on your suggestions. At the same time, I have one more question I'd like to ask: how can networks like ZegCLIP and SAN be applied to the AGD20K dataset? I noticed that networks like ZegCLIP have specific requirements for datasets, particularly regarding the color mapping of different class labels. For example, the "car" class is represented by the color [0, 64, 96]. However, the AGD20K dataset uses sparse annotation, where even different affordance labels are assigned values close to 255. Given this, I would like to ask how ZegCLIP distinguishes between different affordance labels when reproducing the network on the AGD20K dataset. If possible, would it be possible for you to provide the code used for reproduction for reference and study? Thank you in advance for your reply!

Reagan1311 commented 2 weeks ago

The color mapping is specific to each dataset for visualization. Any typical or open-vocabulary segmentation networks can be used for the AGD20K, as long as the raw output is constrained between 0 and 1, as AGD20K is annotated with soft labels in this range. For experiments, we use the official ZegCLIP code, which applies a Sigmoid function in the end.