Lexpedite / clean

MIT License
3 stars 0 forks source link

Add Span elements #7

Closed Gauntlet173 closed 2 years ago

Gauntlet173 commented 2 years ago

In order to be able to subdivide large sections of text so as to be able to make the encodings more explainable, Blawx requires an ability to specify <span> elements with eId attributes inside <p> elements.

Spans can be hierarchical, but I feel like using indented blocks to represent them would make the clean source text look less like its source material. We could use something like the Markdown link format, with square brackets followed by a name in parenthesis, but nestable. But that seems like a lot of parentheses. We could use starting elements and ending elements on their own lines, but that would be a visual mess, too.

1. This is a long section of law that sets out that [default]{by default, 
birds fly}, [penguins]{but some birds are penguins}, [exceptions]{and [penguin exception]{if a bird is a
penguin}, [injured exception]{or it is injured}, it does not fly}.

It might work to have the naming be optional? We also want to avoid colliding with reference styles that are used in legislation, such as to set out abbreviations. We also want to avoid incorrect implications that the spans correspond to something like order of operations within the semantics of the sentence. For that reason, parentheses are probably not ideal around the actual text. Markdown square brackets might also be confusing for people familiar with them. Maybe curly brackets, with square for the names?

I think optional names is probably not the MVP, here. Let's mandate the names, for now, and use curly-square and see how that goes. I think for consistency with how the rest of the metaphor works, we should probably prepend the names. That will make it easier to figure out what's going on if you click on the selectors in Blawx, which are all prepended, and will correspond to how all of the other hierarchical elements are prepended in the canadian style.

Gauntlet173 commented 2 years ago

Created the add_span branch to work on this.

Gauntlet173 commented 2 years ago

See example at https://github.com/pyparsing/pyparsing/blob/master/examples/nested_markup.py

Gauntlet173 commented 2 years ago

add_span now has the span elements added to the parser, and tests are passing. Needs tests to demonstrate that the span parse structure exists inside legal texts in the larger examples. Then I need to update the AN generation.