Open hiwmy0211 opened 1 month ago
Hi, here is the UMD one-shot data we used for training: link. Following AffCorrs, we have changed the affordance of Ladle from "Contain" to "Scoop". For training and testing, we used mmsegmentation to conduct the experiments.
Thank you again for your response! However, the labels downloaded from this link are all black. Is there an issue with the link?
Hi, this is because the value in the label image is from 0~7 (background + 7 affordances). You can assign each value with different RGB colors to visualize it.
Thank you again for your response. I now have a clearer understanding of the training and testing settings under the UMD dataset for both Seen and Unseen configurations. My understanding is that the test sets for both Seen and Unseen configurations in the UMD dataset are the same, consisting of all the data provided by the official website. Could you please confirm if this understanding is correct? Additionally, could you provide the network deployed on the UMD dataset? I noticed that the evaluation metrics used are different from those on the AGD20K dataset, so I would like to ask if it would be possible for you to share the code used for running the network on the UMD dataset. Thank you in advance for your reply!
We use the official test sets from UMD dataset, but they are different for seen and unseen settings.
After you downloading the dataset, you will find the category_split
file is for seen setting, and the novel_split
file is for unseen setting.
The network is the same for the UMD experiments, you can try to implement it using the MMSegmentation toolbox. The code I previously used was stored on a server, but I no longer have access to it.
Thank you very much for your response. I will deploy the network running on the UMD dataset based on your suggestions. At the same time, I have one more question I'd like to ask: how can networks like ZegCLIP and SAN be applied to the AGD20K dataset? I noticed that networks like ZegCLIP have specific requirements for datasets, particularly regarding the color mapping of different class labels. For example, the "car" class is represented by the color [0, 64, 96]. However, the AGD20K dataset uses sparse annotation, where even different affordance labels are assigned values close to 255. Given this, I would like to ask how ZegCLIP distinguishes between different affordance labels when reproducing the network on the AGD20K dataset. If possible, would it be possible for you to provide the code used for reproduction for reference and study? Thank you in advance for your reply!
The color mapping is specific to each dataset for visualization. Any typical or open-vocabulary segmentation networks can be used for the AGD20K, as long as the raw output is constrained between 0 and 1, as AGD20K is annotated with soft labels in this range. For experiments, we use the official ZegCLIP code, which applies a Sigmoid function in the end.
hello, thank you for your excellent work again! May I ask if you could provide the UMD dataset used for training and testing in the network? After downloading the original UMD dataset, I found that it cannot be directly fed into the network for reproduction. Thank you in advance for your reply!