Closed aidiss closed 5 years ago
How could we store these companies?
Here is example, how providers are described:
https://github.com/sirex/opendata/blob/master/providers/gov/lrs.yml
But there is almost o value in having full list of providers described. The idea of this repository is to track how much of open data demand is satisfied. In order to track this, we need information about these things:
So even single data field fully described has much more value, than whole import of available organizations in a municipality.
I would like to track the % of data that is opened by the provider. The value is in monitoring the progress. Can it be done in this repository?
I would like to track the % of data that is opened by the provider. The value is in monitoring the progress. Can it be done in this repository?
Yes, you can track that in this repository.
But there is a question. Lets say, there are data sources A, B, C and D. A, B, C data sources are fully open, available for free and in open, structured format. D is not open. And there are projects X, Y, Z. All projects need D data, which is not open. So if we monitor only percent of open data sources, as you suggest, then we will have really good results, 75% of all data is open. But if we measure the impact, it will be 0%, because the most important data source D is not open, making all other 75% data sources useless.
Surely in this repository you can describe all data sources and monitor how many of them are open and this would be one indicator of open data progress. If that is what your really want to measure, that is ok.
But personally, I think it is more important to measure the impact.
I will try to visualize it:
,--( You are here. )
+----------+ +--------+ +----------------+ +---------+ +--------+
| provider | -> | source | -> | transformation | -> | project | -> | impact |
+----------+ +--------+ +----------------+ +---------+ +--------+
( I want to get there )--'
So knowing available data providers and how open their data sources are is important and needed, but insuficient if you want to get to the impact part.
It seems like chicken-egg problem. If you have data demand you can look who can provide the data, if you have described what is available some project may be inspired by the data.
For example, in Kaunas open data hackaton we had 3 startups about trees, part of the inspiration came from very detailed tree datasets that municipality has provided.
Kaunas municipality has very limited list of institutions it controls, so its possible to go through their websites, ask some questions and describe what data they actually have. Retrieving the data could be more problematic of course.
I would like to think, that we represent those, who would use the data, so our main goal is to tell, what data do we need. On the other hand, government is responsible for providing information about what data they have. Actually, government has huge resources and millions of euros for that. So I hope they will deliver their part by describing data sources they have. And I would like to concentrate on our part - telling what we need.
Currently there are multiple data catalogs, where you can find available datasets, to name a few:
The idea, is to help government to provide high quality data, that would meet the demand. If we do nothing, then we will get just raw data, which still requires a lot of work to use that data in a project. So I want to describe, what exactly we need, and hopefully get that using government resources and money.
I have created a list of organizations that are managed by Kaunas municipality.
It was scraped from http://www.kaunas.lt/administracija/struktura-ir-kontaktai/pavaldzios-imones-ir-istaigos/
How could we store these companies?
Note, we could augment the list by using info from rekvizitai.lt
I used the following code to create the list:
A list of Kaunas municipality companies