brianmario / yajl-ruby

A streaming JSON parsing and encoding library for Ruby (C bindings to yajl)
http://rdoc.info/projects/brianmario/yajl-ruby
MIT License
1.48k stars 169 forks source link

Issue parsing very large JSON file #143

Closed Altonymous closed 10 years ago

Altonymous commented 10 years ago

I have a very large JSON file and it appears to be having problems parsing it.

Here's a gist of the file: https://gist.github.com/Altonymous/cfa9a4d8102287877d37

You'll need to make sure you view the raw. It's all on one line, not formatted pretty.

I've tried 3 different ruby parsers with no luck.

However, jsonlint.com and python both are able to parse it.

Also of note it was python's JSON library that exported the file to being with.

I tried piping:

cat 2014-06-24.json | ruby -ryajl -e "puts Yajl::Parser.parse(STDIN).inspect"

I tried parsing it in irb:

require 'yajl'
file_path = "/temp/2014-06-24.json"
parser = Yajl::Parser.new
json = File.new(file_path, 'r')
hash = parser.parse(json)

And the error I get is:

Yajl::ParseError: lexical error: invalid char in json text.
                                       {"2269": {"recommended_pps":
                     (right here) ------^

    from (irb):7:in `parse'
    from (irb):7
    from /Users/nunya/.rvm/rubies/ruby-2.1.0/bin/irb:11:in `<main>'
lamont-granquist commented 10 years ago

This works for me:

curl -L https://gist.githubusercontent.com/Altonymous/cfa9a4d8102287877d37/raw/1f3aa004faa2c29b10874cce416a5f50733e14f3/2014-06-29.json | ruby -ryajl -e "puts Yajl::Parser.parse(STDIN).inspect"

that's with yajl-ruby 1.2.1 on Mac.

Altonymous commented 10 years ago

What ruby version are you using?

I'm using 2.1.0-p0

lamont-granquist commented 10 years ago

2.1.2

Altonymous commented 10 years ago

I just tried it via the curl method and it worked for me as well. So I'm now wondering if there is a hidden character in my actual file that didn't transfer over.

lamont-granquist commented 10 years ago

i tried 'export LC_ALL=C' to turn off UTF-8 and it still works for me, so there's no UTF-8 in what you pasted. you might have a UTF-8 char in the original source and were using a non-UTF-8 locale and it blew up because of that.

lamont-granquist commented 10 years ago

also tried on a different variety of 1.9.3 and 2.0.0 that i have kicking around in rvm and can't recreate...

Altonymous commented 10 years ago

Closing this issue out. I downloaded the file again and it worked. So I'm not sure what's going on. Seems very odd. I had someone pairing with me when it kept blowing up over and over. So not sure what the deal is. I copy-pasted from the file that wouldn't parse to create the gist.

Unfortunately, I overwrote the file when downloading it again so I don't have the file that was causing issues.

I appreciate you taking a look. If it happens again I'll save the file and figure out a way to host it so others can use the file directly.