codeforIATI / iati-datastore

An open-source datastore for IATI data with RESTful web API providing XML, JSON, CSV plus ETL tools
https://datastore.codeforiati.org
Other
1 stars 1 forks source link

Clarity on timings of updates #258

Open bill-anderson opened 3 years ago

bill-anderson commented 3 years ago

Is your feature request related to a problem? Please describe.

As of now (11:00 on 2 June) IATI Registry says UK FCDO data was updated 10 hours ago Datastore says it was last updated 4 hours ago.

The budget data I've downloaded doesn't appear to be the latest as it doesn't match what is on Devtracker. (Devtracker has in the past been based on published IATI data. I assume this remains the case ...)

For example for Project GB-1-203385, "Sum of May Split value is from the datastore"

d4d

Describe the solution you’d like

Interface should show the start and end times of the data capture as well as the time since the last completed process. Ideally a logfile showing both extract and load times for each publisher file.

A clear and concise description of what you want to happen.

Describe alternatives you’ve considered

A clear and concise description of any alternative solutions or features you’ve considered.

Additional context

Add any other context or screenshots about the feature request here.

andylolz commented 3 years ago

Thanks for this, @bill-anderson.

I assume we’re talking about fcdo-np here, since that’s where Project GB-1-203385 lives.

IATI Registry says UK FCDO data was updated 10 hours ago

I guess you mean this timestamp:

Screenshot 2021-06-02 at 12 52 18 ^^ this is when the metadata was last updated. The dataset in question hasn’t been updated so recently.

Datastore says it was last updated 4 hours ago.

I guess this is from the homepage? This is just for an overview – there are more granular timestamps per dataset. So for fcdo-np, the datastore provides this metadata (albeit in JSON): https://datastore.codeforiati.org/api/1/about/dataset/fcdo-np/

This shows the data in question was successfully fetched last night, and parsed this morning. So I think the datastore is up-to-date.

I’m not sure about the comparison with devtracker, but I’m interested to explore that more. What exactly are the figures in the table you shared? I guess it’s a sum of the transactions on the activities at the hierarchy below, for May 2021?

markbrough commented 3 years ago

@bill-anderson as I understand it, data for Devtracker is now generated from the v2 IATI Datastore.

I think there are still some significant issues with DSv2 updating (I previously reported this here) -- see for example that activity in DSv2: last-updated-datetime="2021-02-02T00:00:00Z" https://iatidatastore.iatistandard.org/search/activity?q=iati_identifier:(GB-1-203385)&wt=xslt&tr=activity-xml.xsl&rows=1

The same activity in Datastore Classic: https://datastore.codeforiati.org/api/1/access/activity.xml?iati-identifier=GB-1-203385 last-updated-datetime="2021-05-26T00:00:00"

There were no parse errors according to Datastore Classic: https://datastore.codeforiati.org/api/1/error/dataset/fcdo-np/

Tagging @rorads and @siemvaessen in case I am mistaken here.

siemvaessen commented 3 years ago

No financial data is reported at all as per data source inside of this activity (parent) as it refers to all the related (child) activities, see https://aida.tools/activity/GB-1-203385/description#relations see child activity https://aida.tools/activity/GB-1-203385/description#relations for example. It the relations view one can see the reference back to the parent again: https://aida.tools/activity/GB-1-203385-106/description#relations We may build a nice network vis in AIDA to clarify and explore :-)

Hope that clarifies @bill-anderson as the data you are seeing for GB-1-203385 is correct and complete.

@markbrough this is not related to data updates, but data modelling imo.

On the matter of last-updated-datetime

see this convo from some weeks back: https://github.com/zimmerman-team/iati.cloud/issues/2595#issuecomment-834330989

The last-updated-datetime is the most abused IATI attribute I know of as a bulk of publishers use systems that print a new timestamp whenever it gets processed even if the elements as part of that activity do not change. As a major bulk of IATI data tricks users into believing something on an activity with a new timestamp has actually has changed, while no data was changed in any of these elements. 😱

So basically the most distrusted attribute imo in this standard is the last-updated-datetime . Even though the standard has a very clear description, bulk of IATI powerproducers ignore this. (we get blame SAP systems here and internal data warehousing solutions..)

117448133-a53cf200-af3e-11eb-9134-03c25d66f0be

No idea why this is not being addressed as about everyone in the IATI community looks at this attribute as some core truth, which it is not.