[code] changes in line 37

gsilvapt commented 8 years ago

Some comments that I noticed that might be worthy discussing with you:

Line 46: Slug will cause confusion to the average people. Can't we replace it with something else?
As for the License, in line 56, I'd remove this bit and add a final print explaining why we opted for a PDDL license. This is part of the requirements of OKFN's datapackage definition and it doesn't make much sense to have a private license really. Your choice. Didn't change this bit of code to not mess the other correction. Have a branch ready if we want to apply this feature.
Might be worthy to add some data wrangling bits like removing special characters (-, /, &, \ , etc) from the fields like Country names- can prepare something if you want to. In fact, I already some scripts that do that so it is a matter of just incorporating those in there. Not sure how you guys tested this script so I am not sure if that is taken care throughout the process. I may need to collaborate with you to get this working but I think we will be fine :smile:

rlafuente commented 8 years ago

Hey, sorry for having missed this! Setting up my notifications so that I don't miss any more PR's...

I've moved these to individual issues in order to better work on each. On to your points:

slug: You're right in that it's not the best word, but I'll try to articulate why I picked that term. The data package spec uses title for the human name and name for the "slug", which in my view is pretty confusing. The slug term is borrowed from Wordpress, which I thought would help with familiarity. The fact that this is a command-line application pretty much excludes "average" people, but you're right in that we need to find a clearer term. Perhaps identifier? (issue #2)
License: things should be definitely made clearer on why PDDL is the default license, and we need a better selector to pick from the recommended licenses from opendefinition.org (issue #3)
Removing special characters: I don't really understand the country names example, can you create a new issue and outline the problem that this means to solve?

Thanks for the great questions! :+1:

gsilvapt commented 8 years ago

Hey, sorry for having missed this! Setting up my notifications so that I don't miss any more PR's...<

No worries!

Removing special characters: I don't really understand the country names example, can you create a new issue and outline the problem that this means to solve?

My experience packaging data has been enough for me to know one of the biggest pains is to get rid of special characters that will interfere with json files. Some sources, like the World Bank, like to use those and some countries' names actually have special characters on their own. Bosnia and Herzegovina is not the same as Bosnia & Herzegovina and Ivory Coast is also different than Côte D'Ivore. As a matter of fact, spaces usually mess up packages too.

So, to get that working again, I added a line to some of my python scripts I did before to just remove those. If you think this is useful, I sure can help :-)

Apologies for not creating separate issues. Never thought you'd push forward with some of these ideas. Nice to see that happening though!

rlafuente commented 8 years ago

Just came back from dinner -- about the PR contents: the [n] follows the convention on command line tools to show the default option if the user presses enter without typing anything in.

Anyway, I had gotten around the need for that question by naming the output file datapackage-new.json if there's already a datapackage.json present -- this was in e5f5811. Since the issue this PR doesn't apply with the new code, I'm closing this PR. I'm addressing your character issue on another comment below!

rlafuente commented 8 years ago

I'm not sure the character issue should be dealt with in this package, at least until we want to include CSV validation.

I haven't encountered the problem you mentioned when dealing with UTF-8 characters in JSON files; normally I use codecs.open(filename, 'w', 'utf-8) when accessing those files (you do need to import codecs).

In order to produce proper UTF-8 JSON with json.dumps, just use these arguments:

json_contents = json.dumps(data, ensure_ascii=False, encoding='utf-8')

Not sure if this addresses your use case. We can go on discussing this, but an issue would be the best place, can you make one?

gsilvapt commented 8 years ago

There's no need. Potentially, the problems arise when converting something to CSV and it contains special characters. If there is a way to go, then I guess I learned something with this PR :smile:

centraldedados / datapaka

[code] changes in line 37 #1