datasets / publicbodies

A database of public bodies such as government departments, ministries etc.
http://publicbodies.org
MIT License
63 stars 26 forks source link

Check we have everything from https://www.gov.uk/government/organisations #7

Open rufuspollock opened 11 years ago

rufuspollock commented 11 years ago

https://www.gov.uk/government/organisations

andylolz commented 11 years ago

I wrote a script to check this. It looks like a few public bodies are missing, and a few sort of match…

What’s the best way to proceed? Update the GB csv? How was the original csv generated?

andylolz commented 11 years ago

Ah, sorry! The data clearly comes from What Do They Know. It would still be good to see the scraper/converter code, so this could perhaps be altered or added to.

In general, I love the idea of putting data into version control. I don’t know if it’s common practice to store it alongside the tool used to collect it, but that seems like a prudent thing to do.

rufuspollock commented 11 years ago

@andylolz to be clear the original source data came from What Do They Know but we aren't confined to that and there are public bodies we would want to list that aren't in What Do They Know (because they may not be FOIable).

So I'd definitely suggest merging your changes to GB csv (adding the script to scripts and a note in the README).

Regarding pulling more regularly from WDTK that's probably new ticket!

rufuspollock commented 10 years ago

@davidread - would you be interested in contributing here?

davidread commented 10 years ago

This little script is good. Humorous to see results like this:

(0.9215686274509803, u'National Institute for Health and Care Excellence', u'National Institute for Health and Clinical Excellence'),

However I think it would be better to use Nomenklatura to do the matching, rather than a one-off Levenshtein and then forget the manual decisions made. I'll take a look if I get a chance.

rufuspollock commented 10 years ago

@davidread agreed re nomenklatura - good connection here with #2 (reconciliation support via nomenklatura ...)

davidread commented 9 years ago

Just noting another list of UK public bodies to reconcile with and track: http://data.gov.uk/dataset/iati-organisation-identifier-for-uk-government-bodies

bill-anderson commented 9 years ago

@davidread @rgrp this list (created by DFID for IATI reporting) is symptomatic of the problem of having no globally consistent methodology for identifying public bodies. This is an issue that the currently-being-born Joined-up Data Alliance (https://docs.google.com/document/d/1ZcBkxKaY9x31t4LH76yJ7dFMA0uyqyJ9Q-tk3FIE7UE/edit) will be tackling. It would be good to link up the pragmatic approach of public-bodies with the standards approach of JDA.