evidens / json2csv

Converts JSON files to CSV (pulling data from nested structures). Useful for Mongo data
MIT License
264 stars 97 forks source link

Can't get gen_outline.py to work #12

Closed iboates closed 9 years ago

iboates commented 9 years ago

Hi, I am trying to get gen_outline to work. In the docs it says to use

python gen_outline.py --collection nodes /path/to/the.json

I am using exactly this:

python gen_outline.py --collection nodes F:\electoral_map\candidates_python\candidates0_to_250.json

And I get KeyError: 'nodes', so I took it out and just tried

python gen_outline.py -collection F:\electoral_map\candidates_python\candidates0_to_250.json

And then it says that it is missing the argument "json_file"

Am I just not entering it right? Please help I am new to github

evidens commented 9 years ago

Hi, If you have a json file with a structure like:

{
  "candidates": [
    {"fullName": "Candidate 1"},
    {"fullName": "Candidate 2"}
  ]

You would want to pass in candidates in place of nodes.

python gen_outline.py --collection candidates F:\electoral_map\candidates_python\candidates0_to_250.json
iboates commented 9 years ago

I am now getting the error "AttributeError: dict object has no attribute 'iteritems'"

I'm thinking that maybe my json is too complicated to be parsed? These are the first three items:

{
"objects": [
{"first_name": "Pascale", "last_name": "D\u00e9ry", "election_name": "House of Commons", "name": "Pascale D\u00e9ry", "elected_office": "candidate", "url": "", "gender": "", "extra": {}, "related": {"boundary_url": "/boundaries/federal-electoral-districts-next-election/24025/", "election_url": "/elections/house-of-commons/"}, "source_url": "http://www.conservative.ca/?member=candidates", "offices": [], "party_name": "Conservative", "incumbent": null, "district_name": "Drummond", "email": "", "personal_url": "http://www.conservative.ca/team/member/?fname=Pascale&lname=D\u00e9ry&type=candidates", "photo_url": "http://www.conservative.ca/media/team/Pascale-Dery.jpg"},
{"first_name": "Christine", "last_name": "Poirier", "election_name": "House of Commons", "name": "Christine Poirier", "elected_office": "candidate", "url": "", "gender": "F", "extra": {"twitter": "https://twitter.com/iciChristine", "facebook": "https://www.facebook.com/iciChristine.ca"}, "related": {"boundary_url": "/boundaries/federal-electoral-districts-next-election/24039/", "election_url": "/elections/house-of-commons/"}, "source_url": "https://www.liberal.ca/candidates/", "offices": [], "party_name": "Liberal", "incumbent": null, "district_name": "", "email": "Christine@iciChristine.ca", "personal_url": "http://christinepoirier.liberal.ca/", "photo_url": "https://www.liberal.ca/files/2014/06/Christine-Poirier-cropped.png"},
{"first_name": "Andrew", "last_name": "Seagram", "election_name": "House of Commons", "name": "Andrew Seagram", "elected_office": "candidate", "url": "", "gender": "M", "extra": {"twitter": "https://twitter.com/AndrewSeagram", "facebook": "https://fb.com/ASeagramNDP"}, "related": {"boundary_url": "/boundaries/federal-electoral-districts-next-election/35032/", "election_url": "/elections/house-of-commons/"}, "source_url": "http://www.ndp.ca/candidates", "offices": [], "party_name": "NDP", "incumbent": null, "district_name": "", "email": "", "personal_url": "http://andrewseagram.ndp.ca", "photo_url": "http://xfer.ndp.ca/2015/-CandidateWebAssets/35032-DON.png"},
...

I used the command:

python gen_outline.py --collection objects F:\electoral_map\candidates_python\candidates0_to_250.json

Thank you very much for your help so far.

EDIT: I formatted one entry of the json for easy viewing, in case that helps:

{
"objects": [
{"first_name": "Pascale",
 "last_name": "D\u00e9ry", 
 "election_name": "House of Commons",
 "name": "Pascale D\u00e9ry",
 "elected_office": "candidate",
 "url": "",
 "gender": "",
 "extra": 
    {},
 "related": 
    {"boundary_url": "/boundaries/federal-electoral-districts-next-election/24025/", 
     "election_url": "/elections/house-of-commons/"}, 
 "source_url": "http://www.conservative.ca/?member=candidates",
 "offices": [],
     "party_name": "Conservative",
 "incumbent": null,
 "district_name": "Drummond", 
 "email": "",
 "personal_url": "http://www.conservative.ca/team/member/?fname=Pascale&lname=D\u00e9ry&type=candidates",
 "photo_url": "http://www.conservative.ca/media/team/Pascale-Dery.jpg"},
evidens commented 9 years ago

Oh, that looks to be an issue with Python 3 vs 2 compatibility, most of this stuff is written for python 2.7 (A lot of people still have 2.x by default and I was trying to keep the additional requirements to a minimum). If you can run it with python2 that's probably the easiest solution.

Otherwise we'll need to create a python 3 compatible branch. Happy to help a fellow Canadian with an interest in politics.

iboates commented 9 years ago

Thanks a lot, I ran it in Python 2.7 and unfortunately have hit yet another problem, this one is in regards to unicode characters. For instance, they very first object contains:

"last_name": "D\u00e9ry"

And it seems as though the json2csv script does not like this unicode character and spits this out:

Traceback (most recent call last):
  File "json2csv.py", line 155 in <module>
    loader.write_csv(filename=outfile, make_strikes=args.strings)
  File "json2csv.py", line 105, in write_csv
    writer.writerows(out)
  File "C:\Python27\lib\csv.py", line 158, in writerows
    return self.writer.writerows(rows)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position1: ordinal not in range(128)

I thought maybe it was an issue in how I was grabbing the data from my source, but the unicode characters themselves are clearly visible in the output so I think it is in your script. I'm going to see if I can look under the hood to find a way to encode it to utf-8 properly but it looks like a lot of it is quite over my head so I doubt I'll get far.

Thanks again for all your help & replies.

EDIT: I have found that if I change line 91 to

return unicode(item, encoding='utf-8')

I get a different error:

    Traceback (most recent call last):
      File "json2csv.py", line 155, in <module>
        loader.write_csv(filename=outfile, make_strings=args.strings)
      File "json2csv.py", line 99, in write_csv
        out = self.make_strings()
      File "json2csv.py", line 82, in make_strings
        for k, val in row.items()})
      File "json2csv.py", line 82 in <dictcomp>
        for k, val in row.items():)
      File json2csv.py", line 91, in make_string
        return unicode(item, encoding='utf-8')
TypeError: decoding Unicode is not supported
evidens commented 9 years ago

The csv library in Python 2.7 assumes ascii by default, but most of my data sources tend to be UTF-8, all you need to do is pip install unicodecsv (it's currently the only requirement in requirements.txt)

JSON always assumes UTF-8 which is why it works out of the box. I'll add a note about UTF-8 support to the README

iboates commented 9 years ago

It's working!

You are officially my favourite person I've met on the internet. Thank you SO much for A. this wonderful script and B. patiently troubleshooting all of this crap for me. I'm very new to github, is there any way I can give you some kind of github "+rep" for this? Because you earned it about 100 times over.