Closed pzwsk closed 5 years ago
Here is the proposal for new set of indicators:
Dataset score
We keep the scoring system for single dataset with the following weights for criteria.
Criteria | Open Data | Restricted | Closed | Unknown | % |
---|---|---|---|---|---|
Does the data exist? | YES | YES | Y/N | +50 | |
Is the data publicly available? | YES | YES | NO | +15 | |
Is the data available in digital form? | YES | Y/N | +5 | ||
Is the data available online? | YES | Y/N | +5 | ||
Is the metadata available online? | YES | Y/N | +5 | ||
Is the data available in bulk? | YES | Y/N | +5 | ||
Is the data machine-readable? | YES | Y/N | +5 | ||
Is the data available for free? | YES | Y/N | +5 | ||
Is the data openly licensed? | YES | Y/N | +5 | ||
Is the data provided on a timely and up to date basis? | Y/N | +0 |
Then dataset indicator is determined based on dataset score:
Open Data >= 100% Restricted < 100 AND >= 65 Closed < 65 AND >= 0 Unknown: no dataset submitted for the key dataset
For each dataset, we return dataset indicator.
Country indicator
For each country, we provide:
Number of datasets open data Number of datasets restricted Number of datasets closed Number of datasets unknown
Note: total number of datasets submitted = open data + restricted + closed
Dataset indicator
We remove the scoring system for single dataset.
For each dataset, we compute and return the dataset indicator using boolean conditions (see table above).
Country indicator
Same as above
Hi @oncletom @CIMAManuel @nastasi-oq see suggestion for new system of indicators.
To be discussed and decided tomorrow.
Main questions being
Many thanks!
@pzwsk this algorithm is wrong IMHO, we must just use a decision tree as described in the table above.
Open Data >= 100% Restricted < 100 AND >= 65 Closed < 65 AND >= 0
Ok, this was an attempt to keep with scoring system for single dataset but would also prefer to use decision tree. Let me sketch one quickly.
Hi @nastasi-oq the decision tree is actually quite simple, see below and let me know what you think.
Note: I am not considering up to date criteria in the evaluation.
See #305
The proposal is to remove country score and replace it by a set of indicators on the number of datasets open data, restricted, closed or unknown for each country.
First, let's have a look to how current version works
Dataset score
The OpenDRI Index uses a set of 10 criteria formulated as questions, weighted in percentage, to assess to what extent a given dataset is open.
For more info and weights assigned to each criteria see here https://index.opendri.org/methodology.html#opendata
A dataset is considered fully open when all questions have been answered YES (score = 100%). When a dataset does not exist or has not been submitted, then the score is 0.
Country score
The country score is the average of all dataset's scores for a given country. It is also expressed as a percentage.
Note: It is possible to submit more than one entry for a given dataset and a given country. The website stores all of them. However, for comparison's purposes, only the dataset with the highest score is retained for the country score.
For the country score, only the hazards for which the level is assessed as medium or higher on ThinkHazard! are taken into account. This means that datasets applicable only to hazards with a low or very low level on ThinkHazard! are not considered for assessing a country since the interest in such data is negligible. For instance, data related to tsunami will not be considered when assessing a landlocked country.
It is also possible to filter and compare countries by category or hazard. For instance, by selecting Base data, only datasets from this category will be taken into account in the overall openness; by selecting Earthquake, only datasets applicable to this hazard will be taken into account.
For more info see: https://index.opendri.org/methodology.html#score