ThinkOpenly / sail

Sail architecture definition language
Other
9 stars 10 forks source link

JSON parsing passes through and emits Unicode characters that Python's json module can't parse #14

Open ThinkOpenly opened 4 months ago

ThinkOpenly commented 4 months ago

In the RISC-V Sail specification, model/riscv_insts_vext_vset.sail:

function clause execute VSETI_TYPE(ma, ta, sew, lmul, uimm, rd) = {
[...]
  /* set vl according to VLMAX and AVL */
  vl = if AVL <= VLMAX then to_bits(sizeof(xlen), AVL)
       else if AVL < 2 * VLMAX then to_bits(sizeof(xlen), (AVL + 1) / 2)
       else to_bits(sizeof(xlen), VLMAX);
  /* Note: ceil(AVL / 2) ≤ vl ≤ VLMAX when VLMAX < AVL < (2 * VLMAX)
   * TODO: configuration support for either using ceil(AVL / 2) or VLMAX
   */

Note that the code uses <=, but the comment uses .

In the JSON emitted by the JSON backend, the latter Unicode character trips up the Python (3.10.12) "json" module:

$ python3 -c "import json; f = open('json.out'); j = json.load(f); print(json.dumps(j),indent=2)"
[...]
json.decoder.JSONDecodeError: Invalid \escape: line 2090 column 1290 (char 403777)

We already use String.escaped before emitting the function text. sigh

vishisht-dubey commented 4 months ago

Hi @ThinkOpenly what is the expected behaviour would you please elaborate??

ThinkOpenly commented 4 months ago

what is the expected behaviour would you please elaborate??

I'm not sure exactly what is being asked. There are two places in the RISC-V Sail code where Unicode characters are embedded in the comments within a function clause execute. The JSON backend is, as of very recently, extracting the code in the body of the function verbatim and emitting it in JSON, as an escaped (by OCaml's String.escaped) string. When that JSON is read by Python (3.10.12), those characters result in an error being emitted by the Python JSON decoder, as shown above.

Expected behavior is that the function body code in the Sail specification can be encoded as a JSON string and parsed by Python's JSON decoder without error.

I'm not asserting that the Sail comments are wrong, but it does seem like using different characters between the code ("<=") and the comments ("") could cause confusion, and arguably should be discouraged.

When I replaced the single character by the two-character sequence, the flow did not emit any errors. Unless there is great need for the single-character representation, a simple replacement by the two-character representation may be the best solution. This should happen upstream.