Closed pzwsk closed 5 years ago
From @oncletom in #221 : what to think about a 95% score with 0% open data?
Here is a sample of the API response for /api/country_scoring/
at the moment:
{
"keydatasets_count": 36,
"fullscores_count": 81,
"datasets_count": 176,
"countries_count": 19,
"countries": [
{
"score": "32.4",
"fullscores_count": 9,
"datasets_count": 12,
"country": "AU",
"rank": 1
},
// ...
]
}
This is how I understand we provide the values for each column, per country:
fullscores_count
datasets_count
datasets_count
- keydatasets_count
Is that it?
Feedback from @vdeparday
Split bars might confuse users as they may think it is possible to have 100% for all bars
Then, stacked bars might be better
š understood.
If there is a sentiment of progression, I'd use the colour contrast to convey this feeling. If it's about the openness ā more contrast = open, less contrast = not open.
Option 2 looks much better I think. It is easier to understand and compare. I am just wondering about the visibility of the legend as you scroll down, you will keep it above? And may be we can add tooltips on hover of the stacked bar.
I suggest to put some text below each indicator to explain. @gracedoherty can you review language especially? Thanks
Hi @oncletom re your comment made on Nov 21, we need to discuss with @nastasi-oq
I am going to open a new issue as this may involve some important changes in BE and API.
We will continue to use this issue as main umbrella issue.
Open Data free to access, use and share
Restricted technical, legal or cost restrictions
Closed access, use and sharing not permitted
Unknown more information needed
Thanks, I am revising a bit based on last modifications regarding definition of closed:
Open Data free to access, use and share
Restricted technical, legal or cost restrictions
Closed access not permitted or does not exist
Unknown Missing information. Submission needed
Feedback from @vdeparday
Split bars might confuse users as they may think it is possible to have 100% for all bars
Then, stacked bars might be better
Need also to be decided in terms of easiness of implementation. We may also have unknown column independent from others.
Last proposal
Open Data, Restricted and Closed in one stacked progress bar Percentage (Open Data) = number of open data / 100 Percentage (Restricted) = number of restricted / 100 Percentage (Closed) = number of closed / 100
More rigorous option would to replace 100 by the maximum of datasets for a country (number of datasets submitted + number of key datasets without submission).
Unknown in another progress bar Percentage =number of key datasets without any submission/total number of key datasets
not blocking thought, just a consideration: if we stop to take in account ThinkHazard! there will be a bias for countries with less perils (because there aren't data for not interesting perils where not needed) compared with others.
Yes, true but
On Tue, Feb 5, 2019 at 9:42 AM Matteo Nastasi notifications@github.com wrote:
not blocking thought, just a consideration if we stop to take in account ThinkHazard! there will be a bias for countries with less perils (because there aren't data for not interesting perils where not needed) compared with others.
ā You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GFDRR/open-risk-data-dashboard/issues/305#issuecomment-460555472, or mute the thread https://github.com/notifications/unsubscribe-auth/ACRKx1cEhCCpmyoc07V_7rp__9SqcsVuks5vKUPhgaJpZM4YS9FC .
@pzwsk , @oncletom @CIMAManuel UPDATED WITH SORT On be_scoring-new branch we have a working experimental version of the new scoring system (without ranking). It is already installed on our experimental instance (without any kind of filters, currently) at: https://exp.riskopendata.org/api/scoring_new/
[
...
["AU", 17, 4, 0, 25],
["GN", 0, 0, 0, 36],
["JM", 0, 0, 0, 36],
...
]
Where the key is the wordbank id of the country and the four columns are:
The sum of all 4 values produces the denominator described by pzwsk
More rigorous option would to replace ...
NOTE: as the old version, also this has pre-computed quantities on save of dataset and that could be forced using the already working Scoring Update
button.
Thanks @nastasi-oq, it's great to have the value.
I was somewhat expecting to have these values as part of /api/country_scoring/
and /api/country_scoring/:country
. I don't see the value in doing another API call for a basic information related to a country.
A second point is related to the data format. Unnamed field sounds fragile and non-very explicit. I prefer explicit, also because it avoids to write code to consume the data.
Would it be possible to have an output which looks like this for /api/country_scoring/
ā¦
{
"keydatasets_count": 36,
"fullscores_count": 81,
"countries_count": 20,
"datasets_count": 180,
"countries": [{
"rank": 1,
"fullscores_count": 7,
"score": 31.8,
"datasets_open_count": 7,
"datasets_count": 12,
"datasets_restricted_count": 3,
"datasets_closed_count": 2,
"datasets_unknown_count": 22
"country": "AU"
},
{
// ...
}
]
}
ā¦ and like this for /api/country_scoring/:country
?
{
"keydatasets_count": 36,
"fullscores_count": 7,
"scores": [ ... ],
"datasets_count": 12,
"score": 31.8,
"datasets_open_count": 7,
"datasets_restricted_count": 3,
"datasets_closed_count": 2,
"datasets_unknown_count": 22
}
It's exactly the same data, but in existing routes.
@oncletom it was just a preview to undestand if the data are consistent with what we want.
About the previous syntax fullscores_*
, score
, datasets_open_count
fields are still used ?
I started from scratch to check performances too but if I must include also expansive data gathering we fall back to slow queries.
I start to rearrange output in a more proper structure.
OK, understood š (I thought it was the final proposal).
I suspect fullscore_count
can be replaced by datasets_count
on the frontend side, to represent submitted datasets (unless if there is a meaningful different with fullscore_count
).
What I understand is score
will remain but will be computed differently, as of https://github.com/GFDRR/open-risk-data-dashboard/issues/415#issuecomment-458580393.
I will adjust #424 to follow the rework of the API, when you next update exp.riskopendata.org
.
What do you think?
@oncletom this is the current outcome from api/scoring/
(the old one is accessible at api/scoring_old/
) on exp.
:
{
"datasets_count": 473,
"keydatasets_count": 36,
"countries_count": 247,
"countries": [
{
"datasets_closed_count": 5,
"datasets_restricted_count": 29,
"datasets_unknown_count": 12,
"rank": 1,
"datasets_open_count": 5,
"datasets_count": 39,
"score": 33.5,
"country": "YF"
},
{
"datasets_closed_count": 3,
"datasets_restricted_count": 32,
"datasets_unknown_count": 11,
"rank": 2,
"datasets_open_count": 2,
"datasets_count": 37,
"score": 31.7,
"country": "AL"
},
....
]
}
Amazing, it looks good, thank you!
I will have a look at it tonight so as you can have feedbacks for tomorrow. Although I can't see what would be necessary to change at this stage.
Hi, great to see we are converging on this
My comments:
score
and rank
fields in the end;countries_count
(I think we still use it for home page indicator though?);
- the proposal is to get rid of
score
andrank
fields in the end;
Already there, we can use it as is (are consistent, currently with the score) and change to a final version late.
- I would also remove
countries_count
(I think we still use it for home page indicator though?);
As you prefer, @oncletom give me an LGTM and I proceed.
- would be good to have a look at how indicators are computed, could you point us to current code?
The pre-computed part, instead is done in get_score_calculate_new
class method:
https://github.com/GFDRR/open-risk-data-dashboard/compare/master...be_scoring-new#diff-358ba6dc1c7f31b62296c6c484e774e7R352
The biggest part of the job is done in the all_countries_new
class method.
https://github.com/GFDRR/open-risk-data-dashboard/compare/master...be_scoring-new#diff-2fc7c76ad15b7f095e4e9b3cf2aeafbfR895
countries_count
is still in use, but only via the /api/stats
route
rank
is not used anymore
score
can be replaced by a custom sorting method, client-side (at the moment, it is in use to stort the "Open / Restricted / Closed" column).
THIS IS A PROPOSAL UNDER DISCUSSION [TO BE AGREED ON]
What are the main questions this proposal is addressing?
For each dataset submitted, evaluate its open data status according to the following indicators:
Then for each country:
number of open data = number of datasets submitted being classified as open data
percentage of open data = number of open data /JOINT(number of datasets submitted, total number of datasets considered)
same for closed and restricted
number of unknown = number of datasets without any dataset submission
percentage of unknown = number of datasets without any dataset submission / total number of datasets considered
Then category and hazard filters apply
Default sorting works in following order: number open data THEN number restricted THEN number closed THEN number unknown
Option 1 with Split Bars
Option 2 with Stacked Bars
See #264 #271 and #270 for background