datasets / publicbodies

A database of public bodies such as government departments, ministries etc.
http://publicbodies.org
MIT License
64 stars 28 forks source link

Add Brazilian federal government import scripts (#60) #63

Closed augusto-herrmann closed 10 years ago

augusto-herrmann commented 10 years ago

Creates a script/migrate directory as per #60.

Includes a shell script to download & unpack the data for Brazilian federal government, as well as a Python script to convert the xml files to the standard csv. The Python script has a few dependencies which have to be installed before using (see README).

This merge will also move the previously existing process.py script (which migrates from the old to the new csv schema) to a new migrate directory.

rufuspollock commented 10 years ago

Hmmm, looks like you've added all the egg-info stuff - anyway to resubmit with just the core code?

augusto-herrmann commented 10 years ago

Of course. I just thought that making it a pip-install'able Python package would make it easier to install the dependencies. Are you sure you prefer just the core code?

rufuspollock commented 10 years ago

You don't need egg-info for requirements.txt and pip install to work - I'd just go for requirements.txt and a README (no need for a setup.py even IMO).

augusto-herrmann commented 10 years ago

Good idea. I'll try it like this and resubmit.

augusto-herrmann commented 10 years ago

Done.

Also did a minor update to the import script itself.

rufuspollock commented 10 years ago

Any way you can resubmit with a "clean" patch - o/w the stuff goes into the repo and then out.

augusto-herrmann commented 10 years ago

By "clean", do you mean in a single commit?

rufuspollock commented 10 years ago

@augusto-herrmann I meant resubmitting a pull request with the commits "squashed" so we don't have commits with stuff going in and then out immediatley just it never going in the first place. If this is a real hassle don't worry - we'll just merge but a cleaner commit history would be nice.

augusto-herrmann commented 10 years ago

@rgrp from what I've figured out so far, I'd need to use the git rebase command. However, every instruction set or article on the subject I've seen so far always assume I did previously create a branch that I want to merge with the master branch. But I had just worked on the master branch of my fork.

So far I'm stuck on this and have no clue as for how to "squash" the commits (save for re-forking and doing everything again).

rufuspollock commented 10 years ago

OK, let's just merge it in - no worries and thanks for all the contributions.

wombleton commented 10 years ago

@augusto-herrmann For the future, you can rewrite your master. git rebase -i good_hash and then git push -f when you're done. Rebasing between branches is just using convenient names instead of a hash. Force pushing things that other people are using is badwrong, though.

augusto-herrmann commented 10 years ago

Thanks for the tip, @wombleton . I'll do that next time.

Actually, coming from Mercurial, this whole "rewriting history" thing that Git encourages seems strange to me. But I'll do it if that's the social norm nowadays.

wombleton commented 10 years ago

Here's the word on rewriting history: https://www.kernel.org/pub/software/scm/git/docs/user-manual.html#fixing-mistakes

I've always taken the "made public" they talk about there to be when it's pushed to github. I might be wrong, as the norm for "made public" seems to be when it's merged to someone else's branch.