Closed gsilvapt closed 8 years ago
Hey, sorry for having missed this! Setting up my notifications so that I don't miss any more PR's...
I've moved these to individual issues in order to better work on each. On to your points:
slug
: You're right in that it's not the best word, but I'll try to articulate why I picked that term. The data package spec uses title
for the human name and name
for the "slug", which in my view is pretty confusing. The slug
term is borrowed from Wordpress, which I thought would help with familiarity. The fact that this is a command-line application pretty much excludes "average" people, but you're right in that we need to find a clearer term. Perhaps identifier
? (issue #2)Thanks for the great questions! :+1:
Hey, sorry for having missed this! Setting up my notifications so that I don't miss any more PR's...<
No worries!
Removing special characters: I don't really understand the country names example, can you create a new issue and outline the problem that this means to solve?
My experience packaging data has been enough for me to know one of the biggest pains is to get rid of special characters that will interfere with json
files. Some sources, like the World Bank, like to use those and some countries' names actually have special characters on their own. Bosnia and Herzegovina is not the same as Bosnia & Herzegovina and Ivory Coast is also different than Côte D'Ivore. As a matter of fact, spaces usually mess up packages too.
So, to get that working again, I added a line to some of my python scripts I did before to just remove those. If you think this is useful, I sure can help :-)
Apologies for not creating separate issues. Never thought you'd push forward with some of these ideas. Nice to see that happening though!
Just came back from dinner -- about the PR contents: the [n]
follows the convention on command line tools to show the default option if the user presses enter without typing anything in.
Anyway, I had gotten around the need for that question by naming the output file datapackage-new.json
if there's already a datapackage.json
present -- this was in e5f5811. Since the issue this PR doesn't apply with the new code, I'm closing this PR. I'm addressing your character issue on another comment below!
I'm not sure the character issue should be dealt with in this package, at least until we want to include CSV validation.
I haven't encountered the problem you mentioned when dealing with UTF-8 characters in JSON files; normally I use codecs.open(filename, 'w', 'utf-8)
when accessing those files (you do need to import codecs
).
In order to produce proper UTF-8 JSON with json.dumps
, just use these arguments:
json_contents = json.dumps(data, ensure_ascii=False, encoding='utf-8')
Not sure if this addresses your use case. We can go on discussing this, but an issue would be the best place, can you make one?
There's no need. Potentially, the problems arise when converting something to CSV and it contains special characters. If there is a way to go, then I guess I learned something with this PR :smile:
Some comments that I noticed that might be worthy discussing with you: