Regarding AffordnaceLLM

JasonQSY / 3DOI

[ICCV 2023] Understanding 3D Object Interaction from a Single Image

39 stars 2 forks source link

Apologize for the questions about your another significant work... Since I have no way to contact you separately, I am posting here after seeing the related issue.

I am interested in your another paper AffordanceLLM: Grounding Affordance from Vision Language Models and am currently working on its implementation.

Thankfully, i was able to download the hard split of the benchmark. But I wonder how to generate the Easy and hard split data.

The following part of the paper: Easy split

Unseen split of AGD20K, 1135/540 images for train and test for fully supervised setting (Where did the 1135 images come from?)
13,323/540 images for the weakly supervised setting ( Unseen/trainset/exocentric///.jpg / Unseen/testset/egocentric///.jpg) Is this correct?

Hard split

868/807 --> hard split of the benchmark (It's okay) (uploaded hard_split_tar)
11,889/807 images for weakly supervised setting
(where did the 11,889 images come from?) -> 50% randomly selected images from Seen/trainset/exocentric?

Could you please tell me detail about the weakly supervised method part? (which images are you using and so on...) And if you use data from a weakly supervised method, how did you get the GT data needed for affordance prediction and learning?

For these data you'll really need to re-process the dataset. To make it possible I've released some data processing and baseline code. https://github.com/JasonQSY/AffordanceLLM I'm sorry I'm not able to debug to make sure the code is easy to run. I've graduated recently and lost a lot of access to specific machines. If you find any issues I'll appreciate a PR. If you plan to release your implementation of AffordanceLLM in the future I'm happy to put it on the project website and acknowledge your contribution.

Fully-supervised setting:

Hard split: You're right.
Easy split: set obj_list=locate_seen_obj_list in https://github.com/JasonQSY/AffordanceLLM/blob/main/data_processing/build_llava_agd20k.py#L133 and obj_list=locate_unseen_obj_list in https://github.com/JasonQSY/AffordanceLLM/blob/main/data_processing/build_llava_agd20k.py#L145C11-L145C42 run the script to generate the json file.

Weakly-supervised setting:

Easy split: It's LOCATE unseen. Set --divide=Unseen.
Hard split: Take a look at the code here and set --divide=Generalization.

JasonQSY / 3DOI

Regarding AffordnaceLLM #4