Storyyeller / Krakatau

Java decompiler, assembler, and disassembler
GNU General Public License v3.0
1.95k stars 220 forks source link

assembler: parsing error for identifiers starting with "INF" #54

Closed skochinsky closed 8 years ago

skochinsky commented 8 years ago

Simple example: .class A .super java/lang/Object

  .field public static final INFO_TYPE_SERIAL_NUMBER S

  .field public static final INFO_TYPE_SUBJECT S

  .field public static final INFO_TYPE_SUBJECT_ALTERNATIVE_NAME S

  .field public static final INFO_TYPE_SUBJECT_RAW S

.end class

produces:

Syntax error at line 4: unexpected token u'INF' Expected: SYNTHETIC, OP_FIELD, OP_INT, OP_CLASS, TOP, OBJECT, OP_WIDE, OP_LOOKUPSWITCH, OP_CLASS_INT, OP_LBL, PROTECTED, STATIC, SAME, METHODTYPE, OP_NONE, SAME_EXTENDED, NULL, PARAMETER, FINAL, LOCALS, WORD, DEFAULT, INVOKEDYNAMIC, SAME_LOCALS_1_STACK_ITEM_EXTENDED, UTF8, CPINDEX, OP_METHOD_INT, PRIVATE, CHOP, TO, APPEND, INTEGER, ARRAY, STACK, FULL, STRING, OP_DYNAMIC, IS, ENUM, UNINITIALIZEDTHIS, METHOD, FIELD, OP_LDC2, OP_TABLESWITCH, OP_LDC1, METHODHANDLE, OP_METHOD, SAME_LOCALS_1_STACK_ITEM, UNINITIALIZED, FROM, STRING_LITERAL, INT, INTERFACEMETHOD, FLOAT, OP_INT_INT, OP_NEWARR, CLASS, TRANSIENT, VOLATILE, DOUBLE, USING, LONG, PUBLIC, NAMEANDTYPE Found: DOUBLE_LITERAL Current stack: [$end, sep, classwithends, version_opt, class_directive_lines, classdec, superdec, interfacedecs, class_directive_lines, topitems, LexToken(D_FIELD,u'.field',5,38), fflags, LexToken(FINAL,u'final',5,59)]

The following change in tokenize.py seems to fix it:

float_base = r'''(?:
    [Nn][Aa][Nn]|                                       #Nan
    [-+]?(?:                                            #Inf and normal both use sign
        [Ii][Nn][Ff]\b|                                   #Inf
        \d+\.\d*(?:[eE][+-]?\d+)?|                         #decimal float
        \d+[eE][+-]?\d+|                                   #decimal float with no fraction (exponent mandatory)
        0[xX][0-9a-fA-F]*\.[0-9a-fA-F]+[pP][+-]?\d+        #hexidecimal float
        )
    )
'''

(added \b) in line 4

Storyyeller commented 8 years ago

Good catch. Should I change the way infinity/nan are represented in the next version to avoid the ambiguity?

Storyyeller commented 8 years ago

In the next version, I'm planning to require that Infinity/NaN begin with a sign to avoid the ambiguity. I'm also requiring that all tokens are whitespace separated (which was supposed to be true already, but apparently Ply doesn't work that way).