bablr-lang / bablr-vm

A VM for enforcing language rules on agAST trees
MIT License
40 stars 2 forks source link

Source layer: unicode boundaries #37

Open conartist6 opened 1 year ago

conartist6 commented 1 year ago

In a semantically layered architecture, the source layer should have responsibility for matters of unicode. These are matters computable at the character level, and indeed require no grammar to compute, only access to a unicode database. There is significant value in making the application of the unicode database to the text explicit. Now the text can be passed around between different runtimes, and it will be evident whether or not something has been lost in translation, i.e. one parser sees a grapheme cluster where another does not. This kind of situation is even possible between versions of the cst-tokens core since each core will have one and only one unicode database, upgrades to which are necessarily somewhat breaking.