centraldedados / datapaka

An easy interface for documenting data packages
GNU General Public License v3.0
19 stars 3 forks source link

[code] changes in line 37 #1

Closed gsilvapt closed 8 years ago

gsilvapt commented 8 years ago

Some comments that I noticed that might be worthy discussing with you:

rlafuente commented 8 years ago

Hey, sorry for having missed this! Setting up my notifications so that I don't miss any more PR's...

I've moved these to individual issues in order to better work on each. On to your points:

Thanks for the great questions! :+1:

gsilvapt commented 8 years ago

Hey, sorry for having missed this! Setting up my notifications so that I don't miss any more PR's...<

No worries!

Removing special characters: I don't really understand the country names example, can you create a new issue and outline the problem that this means to solve?

My experience packaging data has been enough for me to know one of the biggest pains is to get rid of special characters that will interfere with json files. Some sources, like the World Bank, like to use those and some countries' names actually have special characters on their own. Bosnia and Herzegovina is not the same as Bosnia & Herzegovina and Ivory Coast is also different than Côte D'Ivore. As a matter of fact, spaces usually mess up packages too.

So, to get that working again, I added a line to some of my python scripts I did before to just remove those. If you think this is useful, I sure can help :-)

Apologies for not creating separate issues. Never thought you'd push forward with some of these ideas. Nice to see that happening though!

rlafuente commented 8 years ago

Just came back from dinner -- about the PR contents: the [n] follows the convention on command line tools to show the default option if the user presses enter without typing anything in.

Anyway, I had gotten around the need for that question by naming the output file datapackage-new.json if there's already a datapackage.json present -- this was in e5f5811. Since the issue this PR doesn't apply with the new code, I'm closing this PR. I'm addressing your character issue on another comment below!

rlafuente commented 8 years ago

I'm not sure the character issue should be dealt with in this package, at least until we want to include CSV validation.

I haven't encountered the problem you mentioned when dealing with UTF-8 characters in JSON files; normally I use codecs.open(filename, 'w', 'utf-8) when accessing those files (you do need to import codecs).

In order to produce proper UTF-8 JSON with json.dumps, just use these arguments:

json_contents = json.dumps(data, ensure_ascii=False, encoding='utf-8')

Not sure if this addresses your use case. We can go on discussing this, but an issue would be the best place, can you make one?

gsilvapt commented 8 years ago

There's no need. Potentially, the problems arise when converting something to CSV and it contains special characters. If there is a way to go, then I guess I learned something with this PR :smile: