IATI / D-Portal

http://d-portal.org/
Other
30 stars 23 forks source link

Use activity status rather than dates #674

Open stevieflow opened 5 months ago

stevieflow commented 5 months ago

As part of d-portal reflecting the IATI standard, it would be good to use directly the activity-status field to declare the activity status

Current situation

Right now, we know that d-portal interprets the activity-date to then declare activities as either Active or Ended

This makes logical sense, but it can disguise poor data quality, which is something we want to now to and avoid, by using d-portal directly

Here's one example (@isabelbirds can supply others) : https://d-portal.org/ctrack.html?reporting_ref=44000#view=act&aid=44000-P127974

Firefox_Screenshot_2024-04-11T11-20-55 913Z

Yes, this has actual-end dates, but the status is still implementation

Right now, this appears in lists of Ended activities:

Firefox_Screenshot_2024-04-11T11-20-38 913Z

Ideal scenario proposal

We follow the standard directly and group activities by their status:

I appreciate that is a front-end / layout change ---> but seems more acceptable that us thinking up an grouping mechanism for these status codes, which may in turn disguise things again!

@YohannaLoucheur may be interested in this, given context discussions in https://github.com/IATI/D-Portal/issues/436

IsabelBirds commented 5 months ago

Agreed - this would (I think) also remove the need for the "other activities" and "planned activities" groupings as these would be covered by the status groupings.

image


Lack of status also creates a critical schema error, so grouping activities using status would unmask any existing critical errors here.

A couple more older examples here:

@stevieflow Do we need any thought around how Cancelled/Suspended activities are displayed? I am aware this can occur due to sensitive reasons.

stevieflow commented 5 months ago

I think "other activities" are data issues --> but as I understand those issues are derived from the dates then it seems they would not be relevant here (but we can surface these issues in other ways, later

Do we need any thought around how Cancelled/Suspended activities are displayed? I am aware this can occur due to sensitive reasons.

It's very easy to access those activities via the filters : https://d-portal.org/ctrack.html?status_code=5%2C6#view=main

I understand the point that explicitly labelling them on screen is more obvious, but the data is published

YohannaLoucheur commented 5 months ago

Agree with your proposal Steven, including keeping the Cancelled and Suspended categories. If the projects are still published, then they should be accessible (if projects are suspended for security reasons, they should be removed from the publisher's file).

We should keep the "Other" category somewhere though - this tends to be a small number of activities with serious data issues (like our 6, with either inverted or missing dates). Seeing this Other category is first intriguing, then a reminder to (try) to resolve the issue.

xriss commented 5 months ago

Other is still going to be a needed category for when an invalid code is used or the code is missing.

We currently have 7389 activities missing status codes and one activity that has decided to use a value of 8.

There is also a question regarding the ending soon or starting soon activity lists, these will need rethinking if you want to focus on the published codes rather than dates.

stevieflow commented 5 months ago

Thanks @YohannaLoucheur @xriss

We currently have 7389 activities missing status codes

I did a quick analysis. Whilst this will no doubt always be the case, a significant proportion of these activities are from a single publisher:

Activities with no activity-status

prefix reportingorg_ref reportingorg_narrative activities COUNTA of iatiidentifier
aiddata US-501c3-522318905 AidData 6485 86.20%
france     531 7.06%
  FR-6 France, MAEDI 103 1.37%
wwf-uk GB-COH-4016725 World Wildlife Fund 266 3.54%
  GB-COH-1081247 WWF-UK 4 0.05%
spark NL-KVK-41213450 EN: Spark head office Amsterdam 94 1.25%
unrwa XM-DAC-41130 United Nations Relief and Works Agency for Palestine Refugees in the Near East 28 0.37%
weeffect SE-ON-802004-1524 We Effect (fd Kooperation Utan Gränser) 2 0.03%
frankwater GB-CHC-1121273 EN: FRANK Water 2 0.03%
wwf-se SE-ON-802005-9823 WWF/World Wide Fund for nature 1 0.01%
kad     1 0.01%
irishaid     1 0.01%
fowdk DK-CVR-78920610 DA: Verdens Skove, EN: Forests of the World, ES: Bosques del Mundo 1 0.01%
cr     1 0.01%
buildafrica     1 0.01%
ai_1064413     1 0.01%
acodev BE-BCE_KBO-0462279234 FR: ACODEV 1 0.01%
gecfundmanagerpwc GB-COH-03580586-GEC Girls' Education Challenge - Fund Manager PwC 0 0.00%
Grand Total     7523 100.00%

Source: IATI Tables (12 April 2024)

It's unclear what the utility of this publication is, so will investigate separately

Nevertheless, the "Other" box remains important. If we are to change the logic to be based on activity-status then I'd propose that Other is for the non-valid use of this code. We can work on surfacing date logic errors and other issues (particularly with closer ties to the IATI Validator) as new issues @YohannaLoucheur

There is also a question regarding the ending soon or starting soon activity lists, these will need rethinking if you want to focus on the published codes rather than dates.

Thanks @xriss - and appreciate this proposal might start to impact on several long-standing things. I can create a new issue on those lists specifically - it might be useful to know the logic that underpins them, too

Thanks

xriss commented 5 months ago

BTW the place where this is a big change is the "active" activities which are about 50% at odds with the published "implementation" status

https://d-portal.org/ctrack.html?status_code=2#view=main

Gives an obvious comparison of the current way vs as published.

stevieflow commented 5 months ago

Wow, nice - thanks for highlighting this @xriss

I still say we proceed!

stevieflow commented 2 months ago

@xriss @notshi

It would be good to resolve this issue.

Other is still going to be a needed category for when an invalid code is used or the code is missing.

Agreed

There is also a question regarding the ending soon or starting soon activity lists, these will need rethinking if you want to focus on the published codes rather than dates.

I think we can continue to use activity-date for this. If the boxes with the activity counts have to rejigged, I guess they will not then include any count of "Active activities" or "Ended activities" (as the boxes will reflect the activity-status), so the concept of "Ending soon" will not counter this.

Thanks!

stevieflow commented 1 month ago

Hi @notshi @xriss just wanted to check if we have any progress on this one? Thanks

notshi commented 4 weeks ago

Here are a couple of test pages for the new status section.

  1. https://d-portal.org/ctrack.html?test=1#view=main

This is a mixture of the current view with the 6 activity status at the top.

test1

  1. https://d-portal.org/ctrack.html?test=2#view=main

This is a complete rewrite of the status section, displaying the 6 activity status with total, other and reporting orgs.

This might be the most confusing option for 'first contact' with IATI data (to be honest, it confused us!) as the wordings are not intuitive (friendly) when compared to Active, Planned, Ended, etc.

This is also pushing d-portal to more publisher-focused than user-focused, which d-portal was never intended to be in the first place. The latter something we've always pushed for.

test2

stevieflow commented 4 weeks ago

Great, thanks @notshi

My gut reaction is to continue with option 2, as that is truer to the IATI standard

I do like the layout of the different statuses in the first option, as that looks fairly logical.

However, I think by then displaying activity numbers based on the dates logic continues our pain point - in that it contradicts the statuses

Maybe there is scope whereby we could group statuses into new categories such as "active", but will in turn be an editorial choice we are making. that sits outside of the standard

I think by us using the statuses, and then highlighting to specific publishers cases where their statuses and dates are out of synch, will be a key data quality activity, going forward

I fully appreciate your feedback in terms of the departure towards publishers rather than users however

I'll also speak with @IsabelBirds & @robredpath - and we can discuss

@YohannaLoucheur may also be interested in this

Thanks


One (minor) think: should the number of Reporting Orgs have a comma separator ?

notshi commented 4 weeks ago

Thanks, @stevieflow - will add a comma separator to Reporting Orgs.

We were wondering if we changed the wording for the current boxes; ie. Active, Planned, Ended, etc, to something more descriptive and included activity start/ends dates, that might be more useful for both users and publishers.

So instead we have these at the top of the standard statuses. Activities with dates in 2024 (Active) Activities with end date before 2024 (Ended) Activities with start date after 2024 (Planned)

Screenshot_20240905_121841

or even this to differentiate the data

Screenshot_20240905_122417

stevieflow commented 4 weeks ago

Thanks @notshi - really interesting

Certainly agree it's a lot of numbers and boxes! But- the designs are starting to cement, I can see

The second iteration is really interesting.

I had assumed that live d-port's use of dates is based on today rather than the year we are in - is that not the case?

notshi commented 3 weeks ago

Yes, they are based on today but we figured it might be easier to explain and understand with a yearly granularity.

Of course, we will need to adjust the numbers accordingly if we go this route.

stevieflow commented 3 weeks ago

Just discussed with @IsabelBirds

Important / headline data is :

For other statuses, might it be possible to have a plain-ole table?

That leaves the activity-date related dates, which might then sit underneath these headlines

For other activities (activities without a date, or a status?) maybe we need to shaded that differently, or something, to indicate this is a data-quality issue?

We appreciate that might start to change how we design visualse this part of an activity. Maybe we can draw some diagrams/sketches or something?