In a semantically layered architecture, the source layer should have responsibility for matters of unicode. These are matters computable at the character level, and indeed require no grammar to compute, only access to a unicode database. There is significant value in making the application of the unicode database to the text explicit. Now the text can be passed around between different runtimes, and it will be evident whether or not something has been lost in translation, i.e. one parser sees a grapheme cluster where another does not. This kind of situation is even possible between versions of the cst-tokens core since each core will have one and only one unicode database, upgrades to which are necessarily somewhat breaking.
In a semantically layered architecture, the source layer should have responsibility for matters of unicode. These are matters computable at the character level, and indeed require no grammar to compute, only access to a unicode database. There is significant value in making the application of the unicode database to the text explicit. Now the text can be passed around between different runtimes, and it will be evident whether or not something has been lost in translation, i.e. one parser sees a grapheme cluster where another does not. This kind of situation is even possible between versions of the
cst-tokens
core since each core will have one and only one unicode database, upgrades to which are necessarily somewhat breaking.