antlr / antlr4-lab

A client/server for trying out and learning about ANTLR
MIT License
39 stars 11 forks source link

Unicode chars >= \0080 not rendered properly in the tree view #68

Open wernerdaehn opened 1 year ago

wernerdaehn commented 1 year ago

The tree rendering does not show non-ASCII chars (\u0080-\uFFFF) correctly.

Simple grammar:

grammar trivial;
options { caseInsensitive = true; }

WORD: ([A-Z_0-9]  | [\u0080-\uFFFF])+ ;

script: WORD* EOF ;

Input text: aouÄÖÜ

Result:

image

Hierarchy screen is fine.

image
parrt commented 1 year ago

Well that is interesting! I would bet we have an issue with encoding and decoding input somehow. I know that the tree visualization definitely is capable of showing non-ASCII characters.

BTW, is \uFFFF a valid character?

wernerdaehn commented 1 year ago

If \uFFFF is a valid char or not I did not care, my rule simply says "consume all 8bit chars or higher as is". Hence I did not check. I just checked the page, the tree is rendered as SVG. But the chars are rendered as path instead of plain text!

The output is e.g. <path style="stroke:none;" d="M 3.03125 -3.578125 L 3.03125 -3.0625 C 3.015625 -2.5625 2.859375 -2.328125 2.140625 -1.6875 C 1.3125 -0.953125 1.046875 -0.46875 1.046875 0.28125 C 1.046875 1.5625 1.953125 2.390625 3.40625 2.390625 C 5 2.390625 5.8125 1.515625 5.8125 -0.1875 L 4.875 -0.1875 C 4.875 1 4.4375 1.53125 3.453125 1.53125 C 2.59375 1.53125 2.03125 1.015625 2.03125 0.28125 C 2.03125 -0.234375 2.28125 -0.65625 2.859375 -1.171875 C 3.8125 -2 4.015625 -2.328125 4.015625 -2.96875 L 4.015625 -3.578125 Z M 3.03125 -4.625 L 4.015625 -4.625 L 4.015625 -5.765625 L 3.03125 -5.765625 Z M 3.03125 -4.625 "/> </symbol>

Whereas I would have expected <text id="gText_11081308229940" name="-1" x="790.251953" y="-631.9517" font="Arial" rotate="0" horizAnchor="middle" vertAnchor="middle" scale="4,4" width="1" stroke="0x000000">aou&#x00C4;&#x00D6;&#x00DC;</text>

parrt commented 1 year ago

wow. That certainly doesn't look right haha. Thanks for posting