datasets / publicbodies

A database of public bodies such as government departments, ministries etc.
http://publicbodies.org
MIT License
64 stars 28 forks source link

Suggestions for updating/enhancing fields #78

Open citydelver opened 7 years ago

citydelver commented 7 years ago

Checking the fields at http://data.okfn.org/data/okfn/public-bodies, I can think of many useful, practical informations that would be handy to have.

Brazilian public bodies' entries contain many relevant info but they're dumped in the "description" field, for example, such as law of creation (law n. 11.234/1900), keywords regarding the institution's activity ("research", "education", "legal", "indigenous"...). And those informations dumped in the description could be better used if they are spread among specific field ("law of creation", "activities").

Other info could be added: territorial subdivisions of a jurisdiction (as there are many regional and local public bodies, specially in federations like Brazil, Germany, US...); its CNPJ number (the number that identifies in Brazil any entity with legal existence that's not a person), which is used for many purposes, for instance to cross-check its budget incomes and spendings, to sue it in court, etc.; the name of its highest authority ("president" or "director" or "coordinator"...).

Use cases such as searching for all institutions with scope X in countries within criteria Z (eg, all cultural heritage institutions in Latin America, or all public pediatric emergency hospitals in Continental Europe) make me think that each body is allowed to have one and only one classification (category), so Brazilian universities are classified as "fundacao-publica" (public foundation), but "university" (or "higher education facility", or something of the sort) would be pertinent too. And often a university has a hospital in its medical school, or a pro-bono legal clinic in its law school.

augusto-herrmann commented 7 years ago

Nice observations about the data in the Brazilian data! We should definitely include those as optional in the schema, IMO.

Maybe we could start expanding the fields in the schema by adopting the fields Core Public Organization Vocabulary of the EU. That would also make the data exportable to this standard.

mattiasaxell commented 3 years ago

@augusto-herrmann Thanks for linking me here. Yeah it sounds like that this is a better reference mentioned by you in the forum! I believe that one doesn't cancel out the other. It seems that Core Public Organisation Vocabulary has been deprecated and replaced by the e-Government Core Vocabularies.

I'm wondering whether these are developed and available for each country somewhere, i.e. if anyone is already maintaining and publishing this as a data set e.g. for Sweden? Or any other country for that matter.

I believe that this made accessible open source and available as concepts and datasets on data portals such as Swedish Dataportal could increase standardization and use. They could be uploaded or perhaps rather harvested from an external source to be displayed on Swedish dataportal. Sweden currently doesn't have an authority nor collaborative group of organizations which publish and maintain a dataset collecting all authorities according to e-Government Core Vocabularies or any other standard/specification but I think this project could work as a vehicle facilitating the process.

augusto-herrmann commented 3 years ago

Those are excellent suggestions, @mattiasaxell ! Thanks for looking it up.

Would you like to also take a look at the fields present in the e-Government Core Vocabularies and make a map of what we would need to change in the schema here to make it compatible / compliant?

As for the Swedish data we have here, I believe they are probably very outdated. If there is someone here knowledgeable about Swedish data sources, it may be worth opening another issue to update that dataset.

mattiasaxell commented 3 years ago

@augusto-herrmann Thanks! No worries.

I can try and have a go later yes.

Yeah they are very outdated. I know since I added and updated them last time. However I think the issue here it that it does not make a lot of sense to try to keep it up do date in Sweden from volunteer efforts. This is because the original data sources are not open data, published correctly nor coordinated collaboratively by the owners of different data. I'd rather spend energy on supporting the data owners to open and organize their data here and show how that can put Sweden more on the forefront on the field of open data. However we're lacking resources to do this work.

augusto-herrmann commented 3 years ago

Sure. Let's do what is possible for the moment, then.

If even one of the many data sources is published as open data, or even on a website that could be scraped, it might be worth the effort to automate the data collection like I did recently for Brazilian data, in order to have a partially updated dataset. Additionally, when and if someone has the time and energy to support the data owners in opening up, that would be awesome as well.

Anyway, if we need to further discuss the Swedish data, I would suggest opening a new issue, as this one is for the data schema. :slightly_smiling_face:

mattiasaxell commented 3 years ago

@augusto-herrmann Yeah sounds great.

OK, that sounds great. Starting a new issue now.

It might be worth it to write a data specification for a dataset about this? We could use ReSpec https://respec.org/docs/ which is open source: https://github.com/w3c/respec and use https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/e-government-core-vocabularies ?

augusto-herrmann commented 3 years ago

I think we could indeed write a textual specification once it becomes stable enough. For now, to finish this current issue I think we should at least be able to make a proposal for a new set of tables and fields we are going to use, after reviewing the core vocabularies of course.