IIIF / discovery

8 stars 3 forks source link

what to do about mass 'deletes' and 'creates'? #53

Closed mixterj closed 4 years ago

mixterj commented 5 years ago

in CONTENTdm it is not unusual to have Collection Managers create an entire collection of items (potentially 10s of thousands of items), review the items, find a universal error, delete the entire collection and then re-create it with the fix applied.

This would result in a massive number of Activity Streams 'Deletes'/'Creates' or 'Updates'. Is there an expectation that the Activity Steam provider would account for these local practices and not publish activities real-time in events like this?

thanks @shuddles for pointing this out!

aisaac commented 5 years ago

I agree that the scenario is relevant, but what kind of alternative representation would be possible? We still need to represent a big list of changes, don't we? Second point: is this case really different from the one of listing all collection items when they are created in the first place? This is also a case where one could say the listing of activities is very verbose...

aisaac commented 5 years ago

Call 12-06-2019: we could have an activity that targets the entire collection of manifest, and hope for the clients to infer from that the listing of manifest that is changed. But in case of deletion of a collection this collection itself would no longer be published!

aisaac commented 5 years ago

Call 12-06-2019: an alternative would be for the publisher to publish only the latest changes for a given collection, thereby removing unnecessary changes from their perspective. But streams would then become non-monotonic. Would this be a spec violation?

aisaac commented 5 years ago

See also suggestion from @mattmcgrattan on the 12-06-2019 call (https://docs.google.com/document/d/1TstlqzuXCt7f5Tc_vergqzKq5GmDX0z9NG1_7ymPioQ/): can implementers make decisions about “ buffering updates” , e.g. deciding when an event constitutes an event that gets published and when not, and e.g. not advertising updates when there are more than N in some frame and waiting until the event stream on an object or set of objects is stable for some time or other.

mattmcgrattan commented 5 years ago

Following on from the comment above, I guess it's an implementation decision for the publisher which events they consider to be significant events (which they publish) and which not. And there are approaches that could be taken to handle sudden influxes of events on objects or collections which are in flux to mitigate the potential flow of many events, for which they would not want a harvesting to do a reindex.

azaroth42 commented 5 years ago

Discussed on 2019-07-10 call - Clarify in the spec that: