IATI / D-Portal

http://d-portal.org/
Other
30 stars 23 forks source link

Question - El Salvador WASh activities missing from search result #616

Closed andreaszenasidi closed 3 years ago

andreaszenasidi commented 3 years ago

A data user is trying to search for all WASH projects in El Salvador. When this search is run 2 activities were not returned by d-portal, while we can find them in the Datastore search results%20AND%20(recipient_country_code:(SV)%20OR%20transaction_recipient_country_code:(SV))&wt=csv&rows=187).

The activities are:

  1. US-GOV-1-720201451689: Registry, d-portal
  2. ES-DIR3-E04585801-009-053589: Registry, d-portal

We would like to understand why these are not returned by d-portal. Thanks in advance!

notshi commented 3 years ago

Hi @andreaszenasidi

The d-portal search query is an AND search so the results will need to pass this check.

At first glance, it looks like the 2 missing activities do not list El Salvador (SV) as a recipient country nor does it list Water and sanitation (140) as a 3-digit sector group.

Activity 1 does list SV as a recipient country in transactions but did not list 140 as a sector group.

The datastore search has included all 5-digit sectors starting with 140 as well as the 3-digit sector group so that's a different search from the one made on d-portal.

Hope this clarifies things.

notshi commented 3 years ago

Hi @andreaszenasidi looks like there have been some changes to the data from yesterday's nightly import (I can see that SV is now listed as a recipient country for instance) and there are 2 more activities included in the search results. Are these the two missing activities?

andreaszenasidi commented 3 years ago

@notshi thanks for your fast reply. Just to double-check if I understand correctly how d-portal works if I select Sector Group 140 and recipient country El Salvador this will search the sector element for code 140, and all associated 5 digit DAC codes is that correct?

The data did change by today and the second (ES-DIR3-E04585801-009-053589) activity is now in d-portal. The first is not, but you are right that there are no transactions that have both 140 sector AND recipient country SV.

notshi commented 3 years ago

@andreaszenasidi Yes, that's right.

However, there is an extra element on top of this search that also applies to how d-portal understands and visualises the data from IATI. This was a legacy decision as d-portal was originally designed to work with version 1 of the standard.

One of the things we do is split countries and sectors into percentages assigned to this activity. We recreate this percentage split using published transactions; ie. similar to what is done here but if a publisher does not report any commitments, d-portal will use 'expenditure+disbursements' instead. This may not catch every single reference to a country or sector inside an activity.

In the case of US-GOV-1-720201451689, we have assigned it to various countries based on commitments reported. Our method is listed in the FAQ where we look at commitments, expenditures and disbursements with commitments taking priority.

As you can see, whilst SV is listed in disbursements, SV is not listed in commitments and that is why this activity is not listed.

This is a good IATI data example of an edge case.

If a publisher wants a country to be associated with an activity, they should list it in the top country/sector element or publish a commitment to that country within the transactions.

The d-portal filtering is not a simple search due to the original country-centric design. To allow a simple search to pick up any mention of a country and/or sector, like the Datastore, is a distinct change and will require development time to look into.

notshi commented 3 years ago

@andreaszenasidi We've taken a look at the source code and have added a change.

Any country or sector referenced in transactions will now be set as at least 0% which means they will be listed in the search, even though they are not considered a fraction in the activity when it comes to splitting values.

This new change means you should get the results you would expect when you use the d-portal search filter.

However, do let us know if that is not the case.

andreaszenasidi commented 3 years ago

@notshi thanks very much for the detailed explanation and for the change made. The results are indeed much more similar.

I do have one remaining question. Based on your above explanation the activity ES-DIR3-E04585801-009-047903 should show up in the d-portal results, however, they don't. You can find this activity in this dataset.

notshi commented 3 years ago

@andreaszenasidi Looks like there is a duplicate acitivity id used in this dataset by the same publisher.

When that is the case, d-portal picks just one during import, whichever is the latest; the order of import is intentionally randomised.

In this case, the activity that has been picked does not list SV as a recipient-country or 140 as a sector group.

andreaszenasidi commented 3 years ago

@notshi oh I see, thanks for clarifying that. This was very helpful and thanks for your timely responses. I will pass these on to the data user.