YicongHong / Recurrent-VLN-BERT

Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
Other
150 stars 29 forks source link

Why split instructions? #5

Closed jasonppy closed 3 years ago

jasonppy commented 3 years ago

Hi Yicong,

Thanks for open source your code!

I wonder why do you split instructions in /r2r_src/env.py, line 129 to 142

# Split multiple instructions into separate entries
for j, instr in enumerate(item['instructions']):
    try:
        new_item = dict(item)
        new_item['instr_id'] = '%s_%d' % (item['path_id'], j)
        new_item['instructions'] = instr

        ''' BERT tokenizer '''
        instr_tokens = tokenizer.tokenize(instr)
        padded_instr_tokens, num_words = pad_instr_tokens(instr_tokens, args.maxInput)
        new_item['instr_encoding'] = tokenizer.convert_tokens_to_ids(padded_instr_tokens)

        if new_item['instr_encoding'] is not None:  # Filter the wrong data
            self.data.append(new_item)
            scans.append(item['scan'])
    except:
        continue

This is done for original path-instruction but not for prevalent_aug.json. I wonder why do you do this. I understand that instructions in the original data is a bit long, but if you split then in to separate VLN jobs, while the desired path is always the complete path, how can an agent (or human) possibly do that?

Best, Jason

YicongHong commented 3 years ago

Hi Jason,

The code doesn't really break any instruction. Each entry in R2R json data file has three different instructions describing the same path (see example below). So for j, instr in enumerate(item['instructions']): simply separate the three instructions into three training samples, i.e., path H - instr A, path H - instr B, and path H - instr C pairs.

{"distance": 10.84, "scan": "XcA2TqTSSAj", "path_id": 3100, "path": ["eb7c8095d2514ab7a84732fa41ed3594", "bf22e389bd754181924923b4f5e0fe02", "c43ed5913b6d45e5aa3b782bfa860805", "ec0421cc61c64f6f8692d327d6838fd5", "b6985ed68dd8405e969fcbbcd6dfcc42", "213a5fa207dc491894df2f4405c40838", "8fcf0b6f46b8465ab762e45a0060f6f0"], "heading": 1.804, "instructions": [ "Walk out of the bathroom and turn left. Walk along the hallway passed the white painting and the other bathroom. Turn left towards the stairs. Walk down three of the stairs and wait on there. ", "Exit the bathroom to the bedroom. Exit the bedroom using the door on the left then go straight until you get to the stairs and wait on the second step. ", "Walk out of the bathroom into the bedroom and turn left. Continue out the bedroom door, turning left to wait at the top of the stairs. "]}

Each entry in PREVALENT data only has one instruction (because the instructions are generated by the Speaker without randomness), so no splitting required.

Cheers, Yicong

jasonppy commented 3 years ago

Thanks for your timely reply!