erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.83k stars 128 forks source link

Feature Suggestion: Generate railroad diagrams from grammars #104

Open lucaswiman opened 7 years ago

lucaswiman commented 7 years ago

I saw a genomics library called hgvs which has really awesome documentation for its grammar using "railroad diagrams": http://hgvs.readthedocs.io/en/0.4.x/grammar.html (similar to the grammar diagrams in SQLite's documentation). They describe the mechanism for generating theirs as a "fragile hack", but it looks like there's pretty good library support now in the syntrax library.

It would be really nice to have a way of generating diagrams like this given a parsimonious grammar. I'd be interested in implementing integration with syntrax, and possibly a sphinx plugin.

@erikrose: Do you think that makes more sense as a submodule of parsimonious (with an optional requirement of syntrax), or as a separate library? Keeping it inside parsimonious means that it would stay compatible with changes, but be additional maintenance burden. If you'd prefer it as a separate library, please feel free to close this issue.

erikrose commented 7 years ago

Hi, Lucas. Thanks for getting in touch! RR diagrams are a neat idea, and I think they could be a selling point of the package, so let's see what happens if we do it internally. As long as we have good test coverage and don't double the size of the codebase, I'm pretty happy.

I took a quick glance at syntrax and see that it has a GTK dependency for which there's no proper Python package, so that could be one thing to look into. I haven't done much GTK and so don't know how much of a problem that is.

A Sphinx plugin would be sweet.

lucaswiman commented 7 years ago

OK, thanks! I'll start looking into this more, and I'll submit a PR if/when I make progress.

Notes

lucaswiman commented 7 years ago

@erikrose I'm making a fair amount of progress with this. One thing I'd like to do is make the iteration order of Grammar.items() the same as the ordering the rules appear in the original grammar. My current solution just re-parses the grammar and visits the rules to get the ordering, but it'd be nicer to make Grammar inherit from OrderedDict. Unfortunately, that would either:

  1. Break Python 2.6 compatibility, or
  2. Require adding a backport like ordereddict.

Implementing (2) isn't horribly difficult, but would you be OK with dropping python 2.6 compatibility? Its EOL was more than 3 years ago, and many major packages like disutils and django have dropped support for it.

erikrose commented 7 years ago

Sorry for the delay. Of the 3 libs you found (if you want an opinion), I lean toward railroad-diagrams: easy installation, no external binaries to require, SVG output for the at-the-moment-dominant web platform, and (to me) pretty output.

Given that the railroad-diagrams code itself is 1000 lines—and that's before anything you add—I reconsider whether we should roll this into Parsimonious proper, which is only 1456 lines altogether. Shall we shoot for external and just make sure Parsimonious has nice, stable interfaces for it to hook up to? I'm still very interested in showing it off on Parsimonious's docs and using it as a selling point, but this way we have the best of both worlds: lightness for those who want it and power for those who want.

I'm up for dropping 2.6 support and backing Grammar with an ordereddict. It's 2017, after all.

lucaswiman commented 7 years ago

I've submitted a PR to railroad-diagrams adding a setup.py file and fixing python 3 compatibility: https://github.com/tabatkins/railroad-diagrams/pull/44 The author was pretty receptive to a previous PR I submitted, so hopefully that'll go smoothly.

Shall we shoot for external and just make sure Parsimonious has nice, stable interfaces for it to hook up to?

The interfaces of both railroad-diagrams and syntrax are pretty similar to each other, and to parsimonious.expressions (some of the classes even have the same names). So an "interface" would consist of a mapping of diagram elements to parsimonious expressions, with a bit of glue code and special-casing. Putting that into the calling code could be pretty ugly.

It's actually not that much code, so having railroad-diagrams as an optional dependency for the module seems reasonable. I think it might make sense to support both railroad-diagrams (which has easy installation, but makes less appealing diagrams) and syntrax (which has a byzantine installation process, but makes top-notch diagrams).

The API I'm thinking of is roughly the following:

def convert_grammar_to_diagram(grammar:Grammar,
                               collapsible=():Sequence[str],
                               engine='railroad_diagrams'): -> OrderdedDict[str, bytes]
    """
    Return an OrderedDict mapping rule names to bytes objects containing a
    diagrammatic representation of the rule.

    Args:
        grammar: ...
        collapsible:
            A collection of rule names where references can be collapsed.
            This can be useful for hierarchical grammars, where the diagram
            of most rules are just a straight line or disjunction. Including
            the diagram of the referent rather than the reference can show
            larger structural elements, or eliminate rule names which are only
            included so they can be visited.
        engine:
            The rendering engine to use for the images. Either "railroad_diagrams"
            to generate a simple, portable SVG representation of the diagram, or
            "syntrax" to generate a high-quality PNG laid out by the cairo layout
            engine. See the respective packages for installation instructions.
    """
lucaswiman commented 7 years ago

I think it might make sense to support both railroad-diagrams (which has easy installation, but makes less appealing diagrams) and syntrax (which has a byzantine installation process, but makes top-notch diagrams).

Having looked into how difficult it is to get this running in a virtualenv even when installed on the system python, I have thought better of including support for syntrax.

erikrose commented 7 years ago

Sounds good to me. :-) Thanks for the update!