cenotelie / hime

Apache License 2.0
27 stars 4 forks source link

Regression for "Refactor NFA transition normalisation" #105

Closed stevefan1999-personal closed 2 months ago

stevefan1999-personal commented 10 months ago

It seems like e48e2179261140276a8c432fb89068e4f1961096 introduced a regression bug in NFA generation:

grammar C
{
    options
    {
        Axiom = "translation_unit";
        Separator = "SEPARATOR";
    }
    terminals
    {
        // A.1.1 Line terminators
        NEW_LINE        -> U+000D /* CR */
                        |  U+000A /* LF */
                        |  U+000D U+000A /* CR LF */
                        |  U+0085 // Next line character
                        |  U+2028 // Line separator character
                        |  U+2029 ; //Paragraph separator character (U+2029)

        // A.1.2 White space
        WHITE_SPACE     -> uc{Zs} | U+0009 | U+000B | U+000C ;

        // A.1.3 Comments
        COMMENT_LINE    -> '//' (.* - (.* NEW_LINE .*)) ;
        COMMENT_BLOCK   -> '/*' (.* - (.* '*/' .*)) '*/' ;

        // A.1.6 Identifiers
        // fragment IDENTIFIER_CHAR     -> uc{Lu} | uc{Ll} | uc{Lt} | uc{Lm} | uc{Lo} | uc{Nl} ;
        // IDENTIFIER           -> (IDENTIFIER_CHAR | '_') (IDENTIFIER_CHAR | '_' | uc{Nd} | uc{Pc} | uc{Cf})* ;
        IDENTIFIER          ->  [a-zA-Z_] [a-zA-Z0-9_]* ;

        // A.1.8 Literals
        INTEGER_LITERAL_DECIMAL     -> ('0' | [1-9] [0-9]*) ([Uu] [Ll]? | [Ll] [Uu]? )? ;
        INTEGER_LITERAL_HEXA        -> '0' [xX] [a-fA-F0-9]+ ([Uu] [Ll]? | [Ll] [Uu]? )? ;

        fragment EXPONENT -> [eE] ('+'|'-')? ('0' | [1-9] [0-9]*) ;
        fragment REAL_LITERAL_SUFFIX -> [FfDdMm] ;
        REAL_LITERAL                -> ('0' | [1-9] [0-9]*)? '.' ('0' | [1-9] [0-9]*) EXPONENT? REAL_LITERAL_SUFFIX?
                                    |  ('0' | [1-9] [0-9]*) EXPONENT  REAL_LITERAL_SUFFIX?
                                    |  ('0' | [1-9] [0-9]*) REAL_LITERAL_SUFFIX;
        fragment HEX_ESCAPE_LITERAL -> '\\' 'x' [a-fA-F0-9]{1,4}
                                            | '\\' [uU] [a-fA-F0-9]{4} ([a-fA-F0-9]{4})? ;
        CHARACTER_LITERAL           -> '\'' ( (. - ('\'' | '\\' | NEW_LINE))
                                            | '\\' ('\'' | '"' | '\\' | [0abfnrtv])
                                            | HEX_ESCAPE_LITERAL
                                        ) '\'' ;
        STRING_LITERAL      -> '"'  ( (. - ('"' | '\\' | NEW_LINE))
                                            | '\\' ('\'' | '"' | '\'' | '\\' | [0abfnrtv])
                                            | HEX_ESCAPE_LITERAL
                                        )* '"' ;

        SEPARATOR       -> (NEW_LINE | WHITE_SPACE | COMMENT_LINE | COMMENT_BLOCK)+;
    }
}

Panic:

thread '<unnamed>' panicked at F:\Git\github.com\stevefan1999-personal\hime\sdk-rust\src\lib.rs:74:9:
  invalid char span: 2029-2028

Which corresponds to here:

                        |  U+2028 // Line separator character
                        |  U+2029 ; //Paragraph separator character (U+2029)
woutersl commented 2 months ago

Thank you for your feedback, this is not fixed on master.