Tokenization - Githubissues

I'm creating this as a matter of good practice to allow us also add key discussion points from our live sessions and the discord discussion that takes place around the sessions.

Long story short, we already agree that the first step in the process is to implement tokenization to allow further pieces of the "lake house pipeline" to process OData protocol questions and ultimately translate those in to useable business cases ...

Tokenization

The general concept discussed here involves taking the raw "character stream" (string from URI's query) and breaking that out in to something that identifies key pieces of the protocol question asked.

Key points to clarify ...

Token types (e.g. operands, separators, values, or are there more?)
Character space limitations (e.g. ascii set only or do we assume full utf16 chars are valid)
Scoping of what is to be handled (e.g. any string from a URI or only the OData parts)

Hassan has also introduced the idea that would solve this with Regular Expressions ... should we?

OData / OData.Neo

Tokenization #16