askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
360 stars 77 forks source link

Is there a script for creating train/valid/test splits? #57

Closed wcarvalho closed 3 years ago

wcarvalho commented 3 years ago

I'd like to make smaller splits for fast iteration.

Cheers :)

wcarvalho commented 3 years ago

Further inspecting the code, we have a folder, e.g. json_feat_2.1.0, with directories tests_seen tests_unseen train valid_seen valid_unseen. Is looks like you copy the contents of those directories into the parent directory json_feat_2.1.0? Why are you doing this?

MohitShridhar commented 3 years ago

@wcarvalho is this for training? Have you checked out the --fast_epoch option for train_seq2seq.py? See this.

MohitShridhar commented 3 years ago

For the second comment, can you point to what exactly you are referring to in the code? Thanks!

wcarvalho commented 3 years ago

I figured out how to reduce train/valid/test. As for the "copying code", I'm talking about this: https://github.com/askforalfred/alfred/blob/d12caad32e61c7c6f62901b638bfebdb522a2a7b/data/preprocess.py#L81

It looks like you literally are copying the json data to a new path with a few new keys

MohitShridhar commented 3 years ago

This is saving preprocessed data to prepare for training.

So raw traj_data.jsons don't contained tokenized strings etc. The preprocessing tokenizes strings, adds special tokens, and writes them to another json file, which is ready to go for training. These new json files are loaded on the fly during the train run.

wcarvalho commented 3 years ago

makes sense - thanks!