GaloisInc / language-rust

Parser and pretty-printer for the Rust language
https://hackage.haskell.org/package/language-rust
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

`language-rust` lexer rejects Unicode symbols that `rustc` accepts #3

Closed RyanGlScott closed 1 week ago

RyanGlScott commented 3 weeks ago

Per the Rust Reference, Rust permits any identifier that meets the specification in Unicode Standard Annex #31 for Unicode version 15.0. For example, rustc accepts the following program:

// test.rs
fn main() {
    let 𝑂_𝑂 = ();
    𝑂_𝑂
}

language-rust, on the other hand, fails to lex this program:

-- Main.hs
module Main (main) where

import Language.Rust.Data.InputStream
import Language.Rust.Parser
import Language.Rust.Syntax

main :: IO ()
main = do
  is <- readInputStream "test.rs"
  print $ parse @(SourceFile Span) is
$ runghc Main.hs
Left (parse failure at 3:9 (lexical error))

My guess is that this part of the lexer needs to be updated to support Unicode 15.0.

RyanGlScott commented 2 weeks ago

Some assorted notes that I took while investigating this: