How to change from old country score to new country open data indicators?

pzwsk commented 5 years ago

See #305

The proposal is to remove country score and replace it by a set of indicators on the number of datasets open data, restricted, closed or unknown for each country.

First, let's have a look to how current version works

Dataset score

The OpenDRI Index uses a set of 10 criteria formulated as questions, weighted in percentage, to assess to what extent a given dataset is open.

For more info and weights assigned to each criteria see here https://index.opendri.org/methodology.html#opendata

A dataset is considered fully open when all questions have been answered YES (score = 100%). When a dataset does not exist or has not been submitted, then the score is 0.

Country score

The country score is the average of all dataset's scores for a given country. It is also expressed as a percentage.

Note: It is possible to submit more than one entry for a given dataset and a given country. The website stores all of them. However, for comparison's purposes, only the dataset with the highest score is retained for the country score.

For the country score, only the hazards for which the level is assessed as medium or higher on ThinkHazard! are taken into account. This means that datasets applicable only to hazards with a low or very low level on ThinkHazard! are not considered for assessing a country since the interest in such data is negligible. For instance, data related to tsunami will not be considered when assessing a landlocked country.

It is also possible to filter and compare countries by category or hazard. For instance, by selecting Base data, only datasets from this category will be taken into account in the overall openness; by selecting Earthquake, only datasets applicable to this hazard will be taken into account.

For more info see: https://index.opendri.org/methodology.html#score

pzwsk commented 5 years ago

Here is the proposal for new set of indicators:

Option 1

Dataset score

We keep the scoring system for single dataset with the following weights for criteria.

Criteria	Open Data	Restricted	Closed	Unknown
Does the data exist?	YES	YES	Y/N	+50
Is the data publicly available?	YES	YES	NO	+15
Is the data available in digital form?	YES	Y/N	+5
Is the data available online?	YES	Y/N	+5
Is the metadata available online?	YES	Y/N	+5
Is the data available in bulk?	YES	Y/N	+5
Is the data machine-readable?	YES	Y/N	+5
Is the data available for free?	YES	Y/N	+5
Is the data openly licensed?	YES	Y/N	+5
Is the data provided on a timely and up to date basis?	Y/N	+0

Then dataset indicator is determined based on dataset score:

Open Data >= 100% Restricted < 100 AND >= 65 Closed < 65 AND >= 0 Unknown: no dataset submitted for the key dataset

For each dataset, we return dataset indicator.

Country indicator

There is no more overall score expressed a percentage per country.
There is no more filter related to ThinkHazard!
All datasets submission are taken into account, not only the most open per key dataset.

For each country, we provide:

Number of datasets open data Number of datasets restricted Number of datasets closed Number of datasets unknown

Note: total number of datasets submitted = open data + restricted + closed

Option 2

Dataset indicator

We remove the scoring system for single dataset.

For each dataset, we compute and return the dataset indicator using boolean conditions (see table above).

Country indicator

Same as above

pzwsk commented 5 years ago

Hi @oncletom @CIMAManuel @nastasi-oq see suggestion for new system of indicators.

To be discussed and decided tomorrow.

Main questions being

What would be the best option in terms of processing time? (taking into account operations done FE and BE side)
Easiest to implement for both BE and FE? (taking into account current APIs)
I am missing something to cover #305 needs?

Many thanks!

nastasi-oq commented 5 years ago

@pzwsk this algorithm is wrong IMHO, we must just use a decision tree as described in the table above.

Open Data >= 100% Restricted < 100 AND >= 65 Closed < 65 AND >= 0

pzwsk commented 5 years ago

Ok, this was an attempt to keep with scoring system for single dataset but would also prefer to use decision tree. Let me sketch one quickly.

pzwsk commented 5 years ago

Hi @nastasi-oq the decision tree is actually quite simple, see below and let me know what you think.

Note: I am not considering up to date criteria in the evaluation.

GFDRR / open-risk-data-dashboard