google-research-datasets / RxR

Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and Telugu, and 126k navigation following demonstrations. Both annotation types include dense spatiotemporal alignments between the text and the visual perceptions of the annotators
Creative Commons Attribution 4.0 International
113 stars 12 forks source link

The number of instructions seem not consistent with the paper #6

Open Jackie-Chou opened 3 years ago

Jackie-Chou commented 3 years ago

Hi, I am trying to train my own model on RxR, but after I downloaded the data (guide data only), I found that the number of instructions in the provided file, i.e., rxr_train_guide.jsonl, seem not the same as your paper said. Specifically, the paper said there was 1,1089 paths in the training set, but it seems the unique path_ids in rxr_train_guide.jsonl (only English data I considered) are only 8000+. Besides, in Table 5 of the paper, it said there were totally 42K training pairs in Guide data for each language, but 26k was what I got from the file. I am confused about which numbers are right now. Please help. Thanks in advance.

peteanderson80 commented 3 years ago

Hey,

There should be ~11089 paths in the training set including all three languages. But, if you are only looking at English, the number of paths will be only 8824. While most paths are annotated in all three languages, there is a subset of paths that are only annotated in one language to give more variation in paths. See the last paragraph of Section 4, subsection 'Guide Task' in the paper.

The total number of training pairs in the Guide data is ~79467, which is the 'train' split of the 126k instructions detailed in Table 2 (the rest are in the val-seen, val-unseen, test-standard and test-challenge splits). This is about 26k per language, so your numbers sound right.

Jackie-Chou commented 3 years ago

What about the 42k training pairs in Table 5? Should it be 26k actually?