aldrichtr / PSOrgMode

A module for reading and writing the org-mode markup format
0 stars 0 forks source link

Identify the start and end of input that needs to be passed to other parsers #13

Closed aldrichtr closed 2 years ago

aldrichtr commented 2 years ago

Input sections

ConvertFrom-OrgMode currently sees input as a stream of "lines" of text. Each line is evaluated against a regex and acted on "in place". To better parse the input, the function should act more like a lexer. Meaning, input should be separated into 'tokens' that are further processed (parsed) individually.

Lexer functionality

A lexer, by definition, takes input and creates tokens. A token is a chunk of text from the original input, and a "tag" that identifies the type of token this is. For orgmode text, the type will be an org class (element, object, etc.) such as 'headline' ...

Orgmode buffer tokens

ConvertFrom-OrgMode should tokenize the input by:

aldrichtr commented 2 years ago

See my notes in #12 and #14.

This issue is still valid as a feature, because ConvertFrom-OrgMode should collect the content for processing by other parsers, but mainly the objects are created as the lines are parsed.