ValueError: Extra data: line 2 column 1 - line 1149764 column 1 (char 36204 - 7118897109)

evidens / json2csv

Converts JSON files to CSV (pulling data from nested structures). Useful for Mongo data

MIT License

262 stars 98 forks source link

ValueError: Extra data: line 2 column 1 - line 1149764 column 1 (char 36204 - 7118897109) #13

Closed vamshing closed 6 years ago

vamshing commented 9 years ago

When I try to outline the json file,

I get the above error.

Please help

evidens commented 9 years ago

Please provide the full error trace, the portion you provided is very specific to your data (which looks to be a rather large data set).

It looks like you are trying to parse a JSON file with multiple separate JSON root objects (which isn't part of the default spec) you might try the --each-line command since it's not standard JSON.

vamshing commented 9 years ago

Traceback (most recent call last): File "json2csv.py", line 148, in loader.load(args.json_file) File "json2csv.py", line 53, in load self.process_each(json.load(json_file)) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 290, in load kw) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init**.py", line 338, in loads return _default_decoder.decode(s) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 368, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 2 column 1 - line 1149764 column 1 (char 36204 - 7118897109)

I tried the --each-line command for the outline file generation.....it worked fine... but when I tried to convert the json to csv using the outline file........the above error was thrown out/...

Thank for helping in advance..

evidens commented 9 years ago

Did you run json2csv with --each-line as well? Mongo data is pretty much the only file format that put multiple JSON root elements into a single file so it's a special case, not assumed.

vamshing commented 9 years ago

did that....--each-line

Waynes-MacBook-Pro:json2csv vamshiguduguntla$ python json2csv.py /Users/vamshiguduguntla/Documents/04_08_CleanAppData/PlayStore_2015_05.json /Users/vamshiguduguntla/Documents/04_08_CleanAppData/PlayStore_2015_05.outline.json -o /Users/vamshiguduguntla/Documents/04_08_CleanAppData/PlayStore_2015_05.outline.csv --each-line Killed: 9

process took long but it says killed...

evidens commented 9 years ago

Oh, that's a good observation. I created this with the most naive implementation which means it loads all the data into memory before writing it to a file.

I suspect it would suit your purpose if the writing was buffered and periodically written to file so it doesn't require as much memory. I've created a feature branch 13-direct-transcription that has an experiment that will write out to the CSV file instead of to a memory collection. This should avoid getting the process killed due to memory consumption.

Let me know how it works out and I can add more tests and such around it

vamshing commented 9 years ago

This commit branch worked!

Thank you very much...

Now that I have to deal with a 12.2 GB csv file...

Thanks a lot

sakthivel021 commented 6 years ago

How to execute this ?

$ python /home/user1/work/json2csv.py /home/user1/work/data.json usage: json2csv.py [-h] [-e] [-o OUTPUT_CSV] [--strings] json_file key_map json2csv.py: error: too few arguments

evidens commented 6 years ago

Please read the instructions in the README. You need a key-map file to instruct json2csv what data to extract.

Closing this thread because it's resolved