Coverage of the domain specific language

crazydonkey200 / neural-symbolic-machines

Neural Symbolic Machines is a framework to integrate neural networks and symbolic representations using reinforcement learning, with applications in program synthesis and semantic parsing.

Apache License 2.0

377 stars 69 forks source link

Coverage of the domain specific language #17

Closed berlino closed 5 years ago

berlino commented 5 years ago

Hi, Chen:

Can you give the coverage of your domain specific language for wikitable and wikisql? I am trying to reproduce this language and my system can cover around 74% questions in training set for wikitable. I want to make sure if it matches your number.

Thanks!

berlino commented 5 years ago

By coverage, I mean when you conduct search and pruning using random policy.

crazydonkey200 commented 5 years ago

Sorry for the late reply. The coverage, the fraction of questions that have at least one program found, is: WikiTable: 73.24% (after random exploration), 76.99% (after training) Wikisql: 65.45% (after random exploration), 91.46% (after training)

The reason why "after random exploration" is different from "after training" is because we keep running systematic exploration during training so more programs can be discovered for each question. And the reason why the difference between the two is larger in WikiSQL is that we generated 1k programs per question for WikiSQL and 50k for WikiTable during random exploration.

berlino commented 5 years ago

Thanks for the numbers!

By the way, if I understand correctly, your domain specific language does not support operation of union("or"), which was used by previous work like https://arxiv.org/pdf/1508.00305.pdf ( I suppose the operation of intersection("and") can be achieved by recursively calling filter function). I was wondering if the reason of ignoring this operation is due to the fact that it tends to trigger too many spurious programs.

crazydonkey200 commented 5 years ago

You are correct that the DSL doesn't support "or", and it can use multiple filter functions to implement "and".

"or" is ignored for simplicity. I didn't check how many more spurious programs it would introduce, but you might be right. I used a simple rule to preprocess sentences containing "or" like "which one is better, A or B" here, which will merge the two entities ("A" and "B") into one list of entities "[A, B]". So for these simple situations, the model can still handle semantics that require "or".

Hope this helps clarify things.