AmitMY / chimera

Code from the paper "Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation - NAACL-2019.
https://arxiv.org/abs/1904.03396
MIT License
127 stars 24 forks source link

how to test on our own data #15

Open FazeleTavakoli opened 4 years ago

FazeleTavakoli commented 4 years ago

Hi,

We would appreciate if we can have a code to test the model on a different dataset where we can easily get a probable plan given the triples and generate a verbalization out of that.

Thanks in advance.

AmitMY commented 4 years ago

The best solution would be to add test_reader to the Config class, and to use it in this row: https://github.com/AmitMY/chimera/blob/master/process/pre_process.py#L9 (some tweaks necessary like to do it only in test, but not in train or dev).


The simple solution is to run the training code, then to change the test set.

You can run this:

config = Config(reader=WebNLGDataReader,
                planner=neural_planner,
                reg=BertREG)
res = MainPipeline.mutate({"config": config}).execute("WebNLG", cache_name="WebNLG")

Once the model finishes training on the training dataset, you can instantiate a new TestCorpus:

config = Config(reader=YourCustomDatasetReader)
test = TestCorpusPreProcessPipeline.mutate({"config": config}).execute("CustomName", cache_name="CustomName")

And finally combine the two for translation

translate = TranslatePipeline.mutate({*res, "test-corpus": test["test-corpus"]}).execute(...)

The more advanced solution, which is also more extensible is to create your own pipeline based on all of the parts from the process directory.

Here is an example: https://github.com/AmitMY/chimera/blob/master/experiments.py

This file alone runs at least 16 experiments I can remember of different parameters like planners, regs, and decoding methods. It can easily be modified to load whatever train, or dev set you would like, and play with whatever configuration.