datasets / publicbodies

A database of public bodies such as government departments, ministries etc.
http://publicbodies.org
MIT License
63 stars 26 forks source link

implement new import scripts to update Brazilian data #72

Closed augusto-herrmann closed 3 years ago

augusto-herrmann commented 8 years ago

The Brazilian dataset is quite outdated. Especially as a recent organizational reform changed a lot of Ministries.

The source dataset has changed a lot too. Used to be a big XML dump (this is what the import scripts used to read), but now it's a RESTful API using JSON. The data model and fields available have also changed. So the scripts need to be completely re-done.

todrobbins commented 7 years ago

@augusto-herrmann are you interested in taking this or should I move forward with this script?

augusto-herrmann commented 7 years ago

Sure, but @todrobbins before I work in this, I'd like to confirm whether or not you actually did any work in it already, considering a couple of months have passed since then.

rufuspollock commented 7 years ago

@augusto-herrmann i don't believe @todrobbins has done any specific work here so please go ahead!

todrobbins commented 7 years ago

@rufuspollock is correct as I have not done any work in this area. Carry on!

augusto-herrmann commented 7 years ago

I am having difficulty with this as I am unsure about how to handle the changes in organizational structures and how to map identifiers between versions of the organizational structure. I think we need to solve issue #68 before properly handling this.

Alternatively, I could just naïvely run the scripts to obtain the latest version without caring about keeping the ids consistent between the version in use in publicbodies.org (which is quite old) and the current organizational structure of the Brazilian government. Of course, if there is something that currently depends on those ids, it would most likely be broken by this approach.

rufuspollock commented 7 years ago

@augusto-herrmann i've commented in #68. We could trial out the solution of #68 here and see how it goes ...

augusto-herrmann commented 3 years ago

@augusto-herrmann i've commented in #68. We could trial out the solution of #68 here and see how it goes ...

I did not handle or track in time the changes to the structure of public bodies in this PR. There are too many changes over so many years, that it would take an "epic" effort to be able to do that. I don't even think that the structure at a certain point in time is available as open data, so that would be difficult to obtain if not outright impossible.

For now, we just replace the file with the current structure of public bodies in Brazil. Not even the ids are necessarily consistent with the old data, as a different algorithm was used now to generate the slugs (we now use the python-slugify package, while it was a custom slug making code before – I don't think even this lib existed at the time).