About data files used for the FewRel dataset

amazon-science / tanl

Structured Prediction as Translation between Augmented Natural Languages

Apache License 2.0

130 stars 25 forks source link

About data files used for the FewRel dataset #6

Closed wangpf3 closed 3 years ago

wangpf3 commented 3 years ago

Hi! I'm wondering how to prepare the data files for the FewRel dataset. Do we use the full train_wiki.json from https://github.com/thunlp/FewRel/tree/master/data as the training split for meta-training, and the full val_wiki.json for evaluation (support&query)? I'm confused because I notice that the fewrel_meta config also specifies do_eval=True. Then what dev split would the code use? Would appreciate any guidance on this!

benathi commented 3 years ago

val_wiki.json is used for the episodic evaluation (it should be renamed as FewRelEpisodic.json) where a certain part is used as support (we fine-tune with this support then) and then the query set is used to evaluate. The code handles the loading of support and query automatically (and only need this one file).