200ok-ch / org-parser

org-parser is a parser for the Org mode markup language for Emacs.
GNU Affero General Public License v3.0
320 stars 16 forks source link

Parse semantic blocks where appropriate #32

Open schoettl opened 3 years ago

schoettl commented 3 years ago

In #7, we came to the conclusion that it's good to parse semantic blocks (instead of only line-based parsing), but only if it's possible and clean in EBNF/instaparse.

Here is a list of some semantic blocks that would need changes in EBNF:

The following elements can not be parsed as semantic elements:

Some of them are already defined in EBNF but not yet "activated".


Quoting from #11:

In this branch, I work on the higher level syntax according to https://orgmode.org/worg/dev/org-syntax.html

Specifically, I want to check out, if we can move away from line-based parsing towards more semantical blocks, called "elements". The orgmode parser used for export is also called org-element.el.

The spec says, that most elements of the syntax are not context-free and the categories for these elements are

“Greater elements”, “elements”, and “objects”

Greater elements are e.g. #+BEGIN_EXAMPLE blocks. Some of these blocks contain raw text (EXAMPLE, SRC, COMMENT, ...), others can contain formatted text (CENTER, QUOTE, ...). Hence, it's better to parse context-aware and parse the multi-line raw content in EXAMPLE but formatted text in CENTER block.

Also, paragraphs, multi-line footnote definitions, lists, tables, property drawers are maybe better parsed as units instead of line-based.


Parsing semantic blocks can later be enabled by changing EBNF:

- <line> = (headline / drawer-begin-line / drawer-end-line / … / content-line) eol
+ <line> = (headline / drawer / … / content-line) eol