TopQuadrant / shacl

SHACL API in Java based on Apache Jena
Apache License 2.0
217 stars 61 forks source link

Thread safe version of validateShapes #139

Open beaudet opened 2 years ago

beaudet commented 2 years ago

Swapped out WeakHashMap with a Guava variant and synchronized a number of methods and blocks that add validation result nodes to models. This seems to enable reliable multi-threaded validation within validateShapes and the JUnit tests run about 25% faster now so hopefully performance improvements will be even more noticeable with larger numbers of shapes and graphs.

HolgerKnublauch commented 2 years ago

Thanks. Note I will be traveling for the next 3 weeks and may not get into this before I am back, doing only light work.

beaudet commented 2 years ago

Sounds good. Enjoy your break!

HolgerKnublauch commented 2 years ago

Hi @beaudet, I finally had a breather to look into this PR. I see nothing fundamentally wrong here. And I can certainly believe that this will speed up most validation runs when run as a stand-alone application (e.g. command line tool).

My practical problem is that the code base here also needs to work in our product, which is a multi-user server. This complicates matters quite a bit. For example we would need to test the potential improvements in all kinds of contexts, i.e. with different number of active users. And I can, of course, not simply rely on Runtime.getRuntime().availableProcessors() when the true number of available processes would be quite different. I don't think I can simply give away all resources to a single user, e.g. what happens if someone starts a lengthy validation and then others try to do the same - the first user may block too many resources. I am frankly not experienced enough with such scenarios to be able to judge on the benefits, but I can certainly see the immediate workload that we would have here to make this work correctly and safely under all circumstances of our product.

BTW have you done further experiments to measure the benefits? If this is indeed sufficiently beneficial, I guess you could keep your fork alive while I am unable to merge it into our (main) product branch.

afs commented 2 years ago

Uss the ForkJoin pool.

beaudet commented 2 years ago

Happy to make that change if @HolgerKnublauch agrees that's a good approach. Another option might be to add additional signatures or options for validateShapes / validateAll that accepts an executor.

HolgerKnublauch commented 2 years ago

Yes thanks for the pointer, @afs . ForkJoinPool looks promising. I have created a Jira ticket on our internal tracker to investigate this whole topic, so I would hopefully find more time in the coming months to help here.