Fix/distinct on aggregate

comunica / sparqlee

⚙️ SPARQL expression evaluator library - Moved to @comunica/expression-evaluator

https://www.npmjs.com/package/sparqlee

14 stars 7 forks source link

Fix/distinct on aggregate #172

Closed jitsedesmet closed 1 year ago

jitsedesmet commented 1 year ago

This PR aims to fix #151.

I fear this issue will become rather large when we want to solve this issue in an appropriate manner. (This because the typing of the aggregate function lacks information to easilly implement this)

jitsedesmet commented 1 year ago

These first few commits result in a working implementation for the count aggregate. However, it also results in a major speed reduction since each binding in a list of bindings is evaluated seperatly instead of all at once. Two options here that I see: 1) The AggregateEvaluators know when they are handling a wildcard Expression and only than use this slow splitting ) behaviour 2) We change the returntype of our evaluators to return a list of terms. (I suspect this route is a lot of work)

I'm not sure yet how I would differentiate between the different viariable assignments.

jitsedesmet commented 1 year ago

Seems like I was looking in a wrong direction. I was also trying to implement the wildcard *, but the specs state that wildcars is only possible on the count aggregator https://www.w3.org/TR/sparql11-query/#rAggregate

(that's probably why I only saw how I could implement it for that aggregator hehe)

jitsedesmet commented 1 year ago

This PR is now ready. Note, it handles the requested task of fixing distinct. It however does not yet fix the problem when using wildcards. I will try to fix this problem next, in a sepperate PR.

coveralls commented 1 year ago

Coverage: 100.0%. Remained the same when pulling f398eced1a5478f4a0f8720e0c4fc3cb0c34bb74 on jitsedesmet:fix/distinct-on-aggregate into f27a75df62d7055452ae70b98f8edd42215062ab on comunica:master.

jitsedesmet commented 1 year ago

A hashing approach might me beter here but will result in more memory usage. @rubensworks I leave it up to you too decide on that. :)

jitsedesmet commented 1 year ago

I managed to also fix the wildcards in a somewhat easy way since we only need to handle them for count I think this approach is acceptable.

jitsedesmet commented 1 year ago

I started refactoring like you asked and got a little carried away. Imo this makes the aggregators way more readable. And less complex. The put function now has more cases to cover but since we established using the binds is confusing, this might not be a bad thing?

jitsedesmet commented 1 year ago

Let me still test this with Comunica and then this should be ready :)

jitsedesmet commented 1 year ago

Seems like there still was a small issue. This should now be resolved.

jitsedesmet commented 1 year ago

Retested it with Comunica, and the 3 tests suites pass.

rubensworks commented 1 year ago

Released as 2.3.1