AFLplusplus / Grammar-Mutator

A grammar-based custom mutator for AFL++
Apache License 2.0
229 stars 17 forks source link

Update f1_c_gen.py #50

Closed 20urc3 closed 3 months ago

20urc3 commented 3 months ago

Fix #46

This commit addresses a UnicodeEncodeError that occurred when attempting to serialize TreeNode objects containing Unicode characters outside the Latin-1 range (0-255). The specific error was triggered by the character '\u2421'.

Changes:

  1. Modified TreeNode.to_bytes() method:

    • Replaced Latin-1 encoding with UTF-8 for broader Unicode support.
    • Updated val_len to store the byte length of the UTF-8 encoded string instead of the character count.
  2. Updated TreeNode.from_bytes() method:

    • Changed decoding from Latin-1 to UTF-8 to match the new encoding.

These modifications allow the serialization and deserialization of TreeNode objects containing any valid Unicode character, resolving the UnicodeEncodeError while maintaining compatibility with the existing byte structure.

Note: This change may slightly increase the size of serialized data for non-ASCII characters, but it ensures correct handling of all Unicode characters in the grammar.

20urc3 commented 3 months ago

This allow to compile out-of-the-shelf the javascript.json using: make GRAMMAR_FILE=grammars/javascript.json (which was broken until now and was returning

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2421' in position 0: ordinal not in range(256)
make: *** [GNUmakefile:102: src/f1_c_fuzz.c] Error 1
vanhauser-thc commented 3 months ago

thank you!