avakar / pycson

A Coffescript Object Notation (CSON) parser for Python 2 and Python 3.
Other
55 stars 7 forks source link

Error parsing JSON file with cson.load #9

Closed mamrehn closed 5 years ago

mamrehn commented 5 years ago

In the README there is the statement

Note that pycson can parse all JSON documents correctly (Coffeescript can't because of whitespace and string interpolations).

So I assume, one could parse any JSON file with cson. For this, I tried to benchmark the parsing time via

import json
import cson
import timeit

# Rather big (~50MB) JSON file I cannot disclose
# But may be able to provide a MWE if required
p = '/path/to/file.json'

def load_json():
  with open(p, 'r') as fp:
    res = json.load(fp)
  return res

def load_cson():
  with open(p, 'r') as fp:
    res = cson.load(fp)
  return res

load_json()  # <-- works fine, therefore file has valid JSON format
load_cson()  # <-- here the error happens

tj = timeit.timeit('load_json()', number=10)
tc = timeit.timeit('load_cson()', number=10)

print(f'Cson parser is {tj / tc}x as fast as json parser')

and got

Traceback (most recent call last):
  File "/home/username/miniconda3/envs/envname/lib/python3.7/site-packages/speg/peg.py", line 20, in peg
    return p(r)
  File "/home/username/miniconda3/envs/envname/lib/python3.7/site-packages/speg/peg.py", line 74, in __call__
    return r(self, *args, **kw)
  File "/home/username/miniconda3/envs/envname/lib/python3.7/site-packages/cson/parser.py", line 291, in _p_root
    r = p(_p_simple_value)
  File "/home/username/miniconda3/envs/envname/lib/python3.7/site-packages/speg/peg.py", line 74, in __call__
    return r(self, *args, **kw)
  File "/home/username/miniconda3/envs/envname/lib/python3.7/site-packages/cson/parser.py", line 224, in _p_simple_value
    p(r'\}')
  File "/home/username/miniconda3/envs/envname/lib/python3.7/site-packages/speg/peg.py", line 61, in __call__
    self.error(expr=r, err=kw.get('err'))
  File "/home/username/miniconda3/envs/envname/lib/python3.7/site-packages/speg/peg.py", line 93, in error
    raise _UnexpectedError(st, expr)
speg.peg._UnexpectedError: (<speg.peg._PegState object at 0x7f3714877860>, '\\}')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cson_test.py", line 19, in <module>
    load_cson()
  File "cson_test.py", line 15, in load_cson
    res = cson.load(fp)
  File "/home/username/miniconda3/envs/envname/lib/python3.7/site-packages/cson/parser.py", line 10, in load
    return loads(fin.read())
  File "/homeusernamem/miniconda3/envs/envname/lib/python3.7/site-packages/cson/parser.py", line 17, in loads
    return peg(s.replace('\r\n', '\n'), _p_root)
  File "/home/username/miniconda3/envs/envname/lib/python3.7/site-packages/speg/peg.py", line 24, in peg
    raise ParseError(err.msg, s, offset, err.line, err.col)
speg.peg.ParseError: ("expected '\\\\}', found 'E-8,'", 225, 8, 49)

Is there an intuition what might be wrong here, or is a minimal working example including a JSON file required?

avakar commented 5 years ago

Thanks for the report, it seems that pycson incorrectly disallows upper-case E in numbers with exponents.

avakar commented 5 years ago

By the way, you'll probably be very disappointed by the speed. Could you post the numbers here though? I'm interested.

mamrehn commented 5 years ago

Thank you for the fast response! Ah, so the E-8 is probably part of 1.2345E-8 or similar. Is this a bug in speg then?

Regarding the benchmark, here we go:

Setup

import timeit
from urllib.request import urlopen

def test(name, url):
    json_content = urlopen(url).read()

    num_evals = 512
    globals_ = {'json_str': json_content}
    tu = timeit.timeit('loads(json_str)', number=num_evals, globals=globals_,
                       setup='from ujson import loads')  # pip install ujson
    tj = timeit.timeit('loads(json_str)', number=num_evals, globals=globals_,
                       setup='from json import loads')
    tc = timeit.timeit('loads(json_str)', number=num_evals, globals=globals_,
                       setup='from cson import loads')  # pip install cson

    print(f'Testing "{name}" JSON file with {len(json_content)} characters')
    print('Timings: '
          f'{tu / num_evals:0.5f}s ujson parser, ' +
          f'{tj / num_evals:0.5f}s json parser, ' +
          f'{tc / num_evals:0.5f}s cson parser')

    print(f'ujson parser is {tj / tu:0.3f}x faster than json parser')
    print(f'ujson parser is {tc / tu:0.3f}x faster than cson parser')
    print(f'json parser is {tc / tj:0.3f}x faster than cson parser')

if __name__ == '__main__':
    data = {
        'small': 'https://next.json-generator.com/api/json/get/E1WxTbxmL',
        'large': 'https://next.json-generator.com/api/json/get/EkWD0-lX8',
        }

    for name_, url_ in data.items():
        test(name_, url_)
        print()

Results

Testing "small" JSON file with 11030 characters
Timings: 0.00010s ujson parser, 0.00006s json parser, 0.06606s cson parser
ujson parser is 0.604x faster than json parser
ujson parser is 679.771x faster than cson parser
json parser is 1124.894x faster than cson parser

Testing "large" JSON file with 38218 characters
Timings: 0.00024s ujson parser, 0.00032s json parser, 0.42206s cson parser
ujson parser is 1.324x faster than json parser
ujson parser is 1739.524x faster than cson parser
json parser is 1313.395x faster than cson parser

I was not looking for a faster version of the built-in json parser (for this I would use ujson). I was just curious whether the parsing speed is comparable to the standard library parser (which it is not, as you also mentioned).

So for JSON files larger than 100KB I would not use cson to parse then.

avakar commented 5 years ago

This is now fixed in cson-0.8, thanks again for the report and thank you for the numbers!