In #7, we came to the conclusion that it's good to parse semantic blocks (instead of only line-based parsing), but only if it's possible and clean in EBNF/instaparse.
Here is a list of some semantic blocks that would need changes in EBNF:
Specifically, I want to check out, if we can move away from line-based parsing towards more semantical blocks, called "elements". The orgmode parser used for export is also called org-element.el.
The spec says, that most elements of the syntax are not context-free and the categories for these elements are
“Greater elements”, “elements”, and “objects”
Greater elements are e.g. #+BEGIN_EXAMPLE blocks. Some of these blocks contain raw text (EXAMPLE, SRC, COMMENT, ...), others can contain formatted text (CENTER, QUOTE, ...). Hence, it's better to parse context-aware and parse the multi-line raw content in EXAMPLE but formatted text in CENTER block.
Also, paragraphs, multi-line footnote definitions, lists, tables, property drawers are maybe better parsed as units instead of line-based.
Parsing semantic blocks can later be enabled by changing EBNF:
In #7, we came to the conclusion that it's good to parse semantic blocks (instead of only line-based parsing), but only if it's possible and clean in EBNF/instaparse.
Here is a list of some semantic blocks that would need changes in EBNF:
#+BEGIN_xxx
)#+BEGIN:
): sample code
)The following elements can not be parsed as semantic elements:
Some of them are already defined in EBNF but not yet "activated".
Quoting from #11:
Parsing semantic blocks can later be enabled by changing EBNF: