Closed beaudet closed 2 years ago
Ok I can see the validation report seems to skip certain triples. and it ends up with dangling blank nodes that are detached from the report instance. Probably a multi-threading issue with writing to the Graph. I am not sure whether Jena supports concurrent writes, but I guess the default memory graph does not.
I am not sure whether Jena supports concurrent writes, but I guess the default memory graph does not.
Where is this happening?
Jena supports concurrent writes with locks, critical sections (very old) and transactions on datasets.
If you add multiple triples, without a lock, chaos will result even if the data structures are not corrupted.
Right, there is no locking code or transactions in place, because the code isn't expecting concurrent execution ATM.
Are transactions needed even for a 100% read only scenario?
Ah, when the validation report is created new nodes are added to the validation report graph so that's probably where transactions are needed?
And any internal caching.
And passing values into the closure for executor submit need checking (less likely a problem but possible under load if there isn't a write-barrier).
Roughly - expect the worse. It will happen under load on a virtual machine where thread pauses of many seconds are quite possible.
The proper long-term refactoring here would probably be to not create the report triples directly but instead create an intermediate Java data structure (that is thread-safe). But that's a larger change. Meanwhile I think this PR can be closed?
+1 on the suggested approach and to closing the PR.
… and threaded the contents inside the shapes loop.