Closed entslscheia closed 5 years ago
Good question. I think you have two options:
Write a DomainLanguage
that can generate any token sequence recursively.
Write a new State
object that doesn't depend on a grammar.
Either or both of these would be great to have in the repo, if you get them working. Some more detail on each:
DomainLanguage
like this:
Token = str # if you want to have nicer type annotations
class TokenSequenceLanguage(DomainLanguage):
def __init__(self, vocab):
for item in vocab: # not quite right, but you get the idea
self.add_constant(item, Token)
# you could also take the current sentence in here, to handle copying separately
@predicate
def add_token(self, token: Token, token_list: List[Token]) -> List[Token]:
return [token] + token_list
@predicate
def empty_list(self) -> List[Token]:
return []
Then you'll get programs that look something like add_token('print', add_token('(', add_token('"hello"', add_token('"world"', add_token(')', [])))))
. The benefit of this is that it's easy, the drawback is that it adds unnecessary hierarchy. The model can probably memorize this hierarchy reasonably well, though - doing it this way, at each timestep you decide whether to generate another token or stop, then you decide what token to generate. It's completely right branching. So there shouldn't be much difference at all between a typical seq2seq model and this.
State
object that always returns the same set of actions at every timestep. This actually might also be very easy, though it looks like we need to move get_valid_actions
to the base State
class; it's currently only defined for GrammarBasedState
. If you do this, you don't need a DomainLanguage
at all.If you have questions about either of these, I'm happy to answer them. And as I said, I'd love to see both of these options implemented in the repo, if you're willing to contribute back.
Thanks for the explanation! So does it mean that now, at least for lambda-DCS, we can totally get rid of parsimonious_languages
and nltk_languages
?
If you want lambda-DCS, you need the nltk language. We don't have a way to handle variables with the DomainLanguage
grammar induction. For WikiTableQuestions, though, we found that using a different language was better than lambda-DCS (probably because of the difficulty of integrating with SEMPRE for program execution), so we don't actually use lambda-DCS for anything at this point.
Ok. Looks like it's still kind of convoluted to define the actions for lambda-DCS, even I only need to generate logic forms without executing them.
If you want lambda-DCS, you need the nltk language. We don't have a way to handle variables with the
DomainLanguage
grammar induction. For WikiTableQuestions, though, we found that using a different language was better than lambda-DCS (probably because of the difficulty of integrating with SEMPRE for program execution), so we don't actually use lambda-DCS for anything at this point.
DomainLanguage can only be used to define Lisp-like language, right?
Yes, the DomainLanguage
currently only allows for programs that are a single lisp-like function execution.
I am trying to implement several semantic parsers for lambda-DCS. I know production rules can be automatically generated from the functions we define inside DomainLanguage, and so we can base on those production rules to define actions and transition functions, which are the basic build blocks of Allennlp semantic parsing framework. Before relying on DomainLangugae to define the grammar for lambda-DCS, I want to first implement a vanilla seq2seq semantic parser that assumes no constraints on the grammar. I know of course I can implement this directly using a seq2seq model, however, I want to be able to take advantage of the semantic parsing framework provided by Allennlp(e.g., it provides beam search, copy mechanism, e.t.c.). So my question is: what is my best bet to do this? Do I need to use DomainLanguage to define a no-grammar grammar that allows any sequences of tokens from a vocabulary?