Legilibre / DuraLex

DuraLex is a French bill compiler.
GNU Affero General Public License v3.0
35 stars 9 forks source link

Properly manage lists #9

Open Seb35 opened 6 years ago

Seb35 commented 6 years ago

The conversion to PEGs becomes difficult because we have to choose a (good) design to manage lists and combine it (properly) with hierarchical items. I mean expressions like "Au deuxième alinéa, à la troisième phrase du quatrième alinéa". A good real crash test is http://www.assemblee-nationale.fr/15/textes/0911.asp#D_Article_11.

Legacy DuraLex creates this tree:

{"type": "alinea-reference", "order": 2, children: [
  {"type": "sentence-reference", "order": 3, children: [
    {"type": alinea-reference", "order": 4}]}]}

The first version of ToSemanticTreeVisitor (now in its own file) creates this tree:

{"type": "alinea-reference", "order": 2},
{"children": [
  {"type": "sentence-reference", "order": 3},
  {"type": "alinea-reference", "order": 4}]}

With b2f13ab I try a new design for ToSemanticTreeVisitor, it works at small scale given my experiments, it creates:

{"type": "alinea-reference", "order": 2},
{"type": "alinea-reference", "order": 4, "children": [
  {"children": [{"type": "sentence-reference", "order": 3}]}]}

There is currently a small (easy to solve) issue because it creates an untyped container node (it shouldn’t for a sigle child). I didn’t try at a larger scale.

I have no precise idea if this design is a good design. The difficulty is to arbitrate between creating a flat list and/or a hierarchical tree. Another possible design would be to take into account the canonical hierarchy (word < sentence < alinea < article) during merge operation of child (Parsimonious) nodes.

I think a somehow good hierarchy is needed during parsing, before any DuraLex visitor since they cannot re-create some missing information. But probably some visitors will need to be adapted to take into account both flat lists and hierarchical items.

Seb35 commented 6 years ago

@promethe42: it would be great if you can study it and we discuss @mdamien: FYI

Also, I had to manage this same issue for alineas in metslesliens. It was a bit easier because the space is smaller than in DuraLex. I solved it with a mechanism accumulate-collect in each scale, and the depth of this structure is 3. The rules are defined here (with their behaviour related to other rules) and the Parsimonious visitor is here.