IATI / IATI-Datastore

An open-source datastore for IATI data with RESTful web API providing XML, JSON, CSV plus ETL tools
http://datastore.iatistandard.org/
Other
1 stars 0 forks source link

Present last changed date in API for activity #33

Closed rufuspollock closed 11 years ago

rufuspollock commented 11 years ago

Store a 'last changed' data against each individual activity record [E]

last_changed = last_updated_datetime || last_modified of parent file (to be discussed further)

rufuspollock commented 11 years ago

This needs to be discussed further according to call today. Will review within IATI group and decide on business logic.

practicalparticipation commented 11 years ago

The main use case envisaged was consuming tools which wanted to fetch any activities which had changed since they last refreshed their data.

For example:

The last_updated_datetime property of individual activities is neither universally applied, nor reliable. The last_modified of parent files would not work for the above use cases. Therefore this is a property that the store will need to work out for itself based on and it's semantics should be "when did the store first see this activity; or when did the store last see a change in this activity".

The last_changed date is primarily required as an index for querying (essential). However, it would be useful to include in output also (desirable). When storing it in an XML blog, or including it in output, it could be placed within a custom namespace: e.g. 'store:last_changed', to separate it from the raw XML.

practicalparticipation commented 11 years ago

Note that because some producing applications set last_updated_datetime as the time the IATI file was generated by an API, this is not a reliable measure at all of when the activity was last changed.

Other applications have addressed this by performing a string comparison of a stored iati-activity element and an incoming iati-activity element with the last_updated_datetime field removed (using a regex / xml dom).

practicalparticipation commented 11 years ago

Use case is:

User wants to fetch all activities that have changed since they last looked for data.

A user may use this in combination with a call to look for deleted activities to get their own system in sync with IATI data without doing a full refresh from the registry / the data store.

joetsoi commented 11 years ago

@practicalparticipation, We're storing the XML blobs, so the code just takes a hash of the existing and new activity for each resource and compares the hash string and keeps the old 'last_change' date if they match. I've added a filter 'last_change' and added it to the json output for now.

practicalparticipation commented 11 years ago

Great. And as we're doing this at the activity level I think it avoids the need to remove any generated date-time information (found in the parent element; the registry omits this from it's hashes).

I've just checked and assume this isn't live yet. Will look out for when tagged for test.

joetsoi commented 11 years ago

it's currently live, but most of the activites will probably have an innaccurate last changed date as they were parsed prior to this being added

practicalparticipation commented 11 years ago

Trying with http://iati-datastore.herokuapp.com/api/1/access/activity?last-change__gt=2013-06-01 I get a 'bad filter' - and then, bizarrely, in the Chrome I get told that the page contains elements common on Phishing sites...

Can you post example URL to test with?

joetsoi commented 11 years ago

@practicalparticipation, I didn't add the last-change filters to the validation stage before the filtering. I've added it in the latest commit and the url you posted should be working

The chrome phising warning is strange indeed. I'll have a poke around to see if I can find out what is triggering it.

practicalparticipation commented 11 years ago

This appears to be working - great.

bill-anderson commented 11 years ago

Ditto