dabeaz / sly

Sly Lex Yacc
Other
816 stars 107 forks source link

Multiple objects in one text file #72

Open AngledLuffa opened 3 years ago

AngledLuffa commented 3 years ago

I've been building a parser where the goal is to read multiple objects from the same file (trees from a Penn Treebank style parser dataset, in case that helps). Currently my top level rule handles one tree, with the expectation that there will be multiple trees in the same file. Is there a simple solution for this? I've tried treating the parser as a generator, and that doesn't seem to work, and if I just pass the file with multiple trees in it, the parser complains about a syntax error when it starts reading the second tree.

So far the best idea I have is to make a list of objects, but it seems like there must be a better way.

So, instead of this:

class TreeParser(Parser):
    tokens = TreeLexer.tokens

    # the extra layer of productions at the top is so that we can handle trees such as
    #  ((tree stuff))
    # by adding a ROOT node at the very top
    @_('LPAREN factor RPAREN')
    def root(self, p):
        return Tree(label="ROOT", children=[p.factor])

    @_('factor')
    def root(self, p):
        return p.factor

Now I have the following productions at the top instead (the list is backwards, but I'll work around that)

    # TODO: hopefully there's some other way to parse multiple trees from one file
    @_('root treelist')
    def treelist(self, p):
      trees = p.treelist
      trees.append(p.root)
      return trees

    @_('root')
    def treelist(self, p):
        return [p.root]

I can actually link the full parser if that helps. It's pretty simple.

Thanks!

AngledLuffa commented 3 years ago

Addendum: without having looked at the source, is there an efficiency difference between root treelist and treelist root? Are they both linear runtime, hopefully?