antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.11k stars 3.28k forks source link

[Feature] Parsing of binary streams instead of utf-8? #3532

Open bendgk opened 2 years ago

bendgk commented 2 years ago

ANTLR should support the parsing of raw binary streams (1 byte per input character) alongside UTF-8 characters.

have it available as an encoding, for example: antlr4 -encoding raw Grammar.g4

For example to match the byte 0x80 it is not possible right now as \u0080 evaluates to 0xC280

parrt commented 2 years ago

Hi. I'm not super familiar with all the details of unicode and the various encodings. ANTLR can open files with whatever format the file is in. I don't know if there's such a thing as UTF-8 that is UTF-8 plus some other stuff. can you correct me? thanks!