Question about 'text_masks' in pose trace data

google-research-datasets / RxR

Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual perceptions of the annotators

Creative Commons Attribution 4.0 International

113 stars 12 forks source link

Question about 'text_masks' in pose trace data #4

Closed awesomericky closed 3 years ago

awesomericky commented 3 years ago

Hello,

I am curious about the feature in 'text_masks' in pose trace data. I checked the data and found that the first word in instruction are always not masked while other words mask change in monotonic way which is reasonable as the agent progresses to the goal. Why is the first word in every viewpoint not masked?

git_rxr

peteanderson80 commented 3 years ago

That is because we are using BERT tokenization and embeddings, so the first word is the CLS token which contains a representation for the entire instruction. We therefore decided to never mask that token.

guhur commented 2 years ago

Can you detail which tokenizer you employed? I tried with HF's BertTokenizer but I didn't obtain the same number of tokens as you. Thanks