askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
352 stars 77 forks source link

Question about the provided data 'feat_conv.pt' #109

Closed Jiahui-3205 closed 2 years ago

Jiahui-3205 commented 2 years ago

Hi I am trying to use the ALFRED data do some experiments and I am little confused about the provided resent-18 data. After loaded a trajectory, the resnet-18 feature is [38, 512, 7, 7], and the trajectory contains 38 low-level actions. I am wondering whether each low-level action is match to one [1, 512, 7, 7] feature? ( e.g. action[0] is match to feat_conv[0] )

Another question is that: I also found one low-level action is matched to several images. Like images from 0.png to 10.png are all point to the low-level action 0. I am wondering whether these 10 images are all encoded into it's corresponding [1, 512, 7, 7] feature? Or a single image (like 0.png) is encoded as that feature?

Thank you

MohitShridhar commented 2 years ago

@Jiahui-3205, please see Q7 and Q4 in the FAQ: https://github.com/askforalfred/alfred/blob/master/doc/FAQ.md

image

image

Jiahui-3205 commented 2 years ago

Thank you for the reference! One more question, if we only know the observations after each low-level action, how can we know the initial agent observation (i.e. the observation before the first low-level take place)? Thanks

MohitShridhar commented 2 years ago

@Jiahui-3205, good question. If I remember correctly, the first frame in Full Dataset [images] might be almost equivalent to the initial observation because the smoothed actions are so tiny.

If for some reason you need the exact initial observation, you can use augment_trajectories.py to iterate through all the scenes in the dataset and save the desired frames.

Jiahui-3205 commented 2 years ago

Will try it, and thanks for answering!