commonmark / commonmark-spec

CommonMark spec, with reference implementations in C and JavaScript
http://commonmark.org
Other
4.89k stars 317 forks source link

An error in json test case #742

Closed arvillion closed 1 year ago

arvillion commented 1 year ago

I believe there's an error in the example 333 in https://spec.commonmark.org/0.30/spec.json

Character "b" should be surrounded with unicode whitespace(ascii code 160), as shown in https://spec.commonmark.org/0.30/#example-333 However, in the json file, normal whitespaces are used (ascii code 32).

dbuenzli commented 1 year ago

Looks fine to me. The raw data has c2 a0 byte sequences that is the UTF-8 encoding of U+00A0 (NO-BREAK SPACE). If you see something else you may have a process somewhere in your pipeline that does something to your data.

> xxd spec.json
...
00012190: 2020 7b0a 2020 2020 226d 6172 6b64 6f77    {.    "markdow
000121a0: 6e22 3a20 2260 c2a0 62c2 a060 5c6e 222c  n": "`..b..`\n",
000121b0: 0a20 2020 2022 6874 6d6c 223a 2022 3c70  .    "html": "<p
000121c0: 3e3c 636f 6465 3ec2 a062 c2a0 3c2f 636f  ><code>..b..</co
000121d0: 6465 3e3c 2f70 3e5c 6e22 2c0a 2020 2020  de></p>\n",.    
000121e0: 2265 7861 6d70 6c65 223a 2033 3333 2c0a  "example": 333,.
000121f0: 2020 2020 2273 7461 7274 5f6c 696e 6522      "start_line"
00012200: 3a20 3539 3337 2c0a 2020 2020 2265 6e64  : 5937,.    "end
00012210: 5f6c 696e 6522 3a20 3539 3431 2c0a 2020  _line": 5941,.  
00012220: 2020 2273 6563 7469 6f6e 223a 2022 436f    "section": "Co
00012230: 6465 2073 7061 6e73 220a 2020 7d2c 0a20  de spans".  },. 
arvillion commented 1 year ago

Looks fine to me. The raw data has c2 a0 byte sequences that is the UTF-8 encoding of U+00A0 (NO-BREAK SPACE). If you see something else you may have a process somewhere in your pipeline that does something to your data.

> xxd spec.json
...
00012190: 2020 7b0a 2020 2020 226d 6172 6b64 6f77    {.    "markdow
000121a0: 6e22 3a20 2260 c2a0 62c2 a060 5c6e 222c  n": "`..b..`\n",
000121b0: 0a20 2020 2022 6874 6d6c 223a 2022 3c70  .    "html": "<p
000121c0: 3e3c 636f 6465 3ec2 a062 c2a0 3c2f 636f  ><code>..b..</co
000121d0: 6465 3e3c 2f70 3e5c 6e22 2c0a 2020 2020  de></p>\n",.    
000121e0: 2265 7861 6d70 6c65 223a 2033 3333 2c0a  "example": 333,.
000121f0: 2020 2020 2273 7461 7274 5f6c 696e 6522      "start_line"
00012200: 3a20 3539 3337 2c0a 2020 2020 2265 6e64  : 5937,.    "end
00012210: 5f6c 696e 6522 3a20 3539 3431 2c0a 2020  _line": 5941,.  
00012220: 2020 2273 6563 7469 6f6e 223a 2022 436f    "section": "Co
00012230: 6465 2073 7061 6e73 220a 2020 7d2c 0a20  de spans".  },. 

Thanks. The raw data looks fine. I think the problem is due to that my browser renders json in a way so that no-break space is displayed as normal space.