bbcarchdev / spindle

RES Linked Open Data aggregation engine
https://bbcarchdev.github.io/spindle/
Apache License 2.0
2 stars 1 forks source link

Triggers loop #78

Closed cgueret closed 8 years ago

cgueret commented 8 years ago

When looking at the status of all the proxy entities, it can be observed that the number of "completed" proxies drops regularly: screen shot 2016-06-23 at 15 13 43 Using a PSQL interface to browse the data it can be observed that some triggers are set to refresh a lot of resources: screen shot 2016-06-23 at 15 25 53 In particular the following sequence generates the chain-saw pattern:

When everything except _void-terms.nq from the shakespeare data is ingested the problem is gone and all the proxies are steadily processed until completion. That is, removing all the triples having "http://data.vm-10-100-0-20.ch.bbcarchdev.net/terms#id" has a subject "solves" the issue at hand.

nevali commented 8 years ago

ok, the thing to look at here is what the triggers actually are, but the culprit will be the fact that this https://github.com/bbcarchdev/spindle/blob/develop/twine/generate/triggers.c#L90 is indiscriminate, and maybe needs to be made a bit smarter so that we track which triggers are freshly-added and apply those, rather than all of them (noting that when the flags are -1, indicating that the proxy's been completely re-built, all of the triggers are and should continue to be considered 'freshly-added')

cgueret commented 8 years ago

Ok! We'll have a look into it.

cgueret commented 8 years ago

When ingesting the Shakespeare dataset it was the central #terms that was causing this issue. Every process of a new image or video would trigger an update on terms, which would in turn trigger an update on all the images and videos pointing to it. Acropolis was doing several steps back for every single step forward. After some testing we realised that we could prevent this from happening by being more cautious about what can trigger what in terms of updates. If we prevent, say, a MEDIA update to trigger a MEMBERSHIP update the problem is gone.

The relevant commits are: https://github.com/bbcarchdev/spindle/commit/613f48f3e8257a24c24fdce2dc622ca071be7e67 https://github.com/bbcarchdev/spindle/commit/1206bd78f0430b78ca9d40b3a6d9fc1f9ffe3b5f