aardvark-platform / aardvark.base

Aardvark.Base is the foundation of the open-source Aardvark Platform for visual computing, real-time graphics, and visualization.
https://aardvarkians.com/
Apache License 2.0
154 stars 9 forks source link

Vrml97 Parser hardcoded to ASCII #46

Closed luithefirst closed 4 years ago

luithefirst commented 4 years ago

VRML uses utf8 encoding according to its specification. All my files also begin with: #VRML V2.0 utf8

The Tokenizer contains a NextChar method. This looks like the place that should handle the utf8 decoding.

A .Net string uses utf16, so a single char is actually not enough for 21bits that utf8 can have. A solution could be to change the return type of NextChar to int and that either contains one or two 16bit chars.

luithefirst commented 4 years ago

The Tokenizer now uses UTF-32 for internal processing, reading the file content handles the utf8 encoding.