Closed jitsedesmet closed 3 years ago
I think this was originally done because expressions in an aggregate expressions are a lot simpler, e.g. you can not nest EXISTS
in there, i.e. only RegularFunctions, not SpecialFunctions. But now I'm not so sure. I can not immediately find it in the spec. So could definitely make sense.
Be aware, aggregation is very statefull. There is some shenanigans in the AggregateEvaluator to provide a nice API (replacing functions on the first call), both outwards and inwards. Take care when implementing an async alternative.
The BaseAggregators should be able to be reused.
Thank you for the clarification.
It is clear that an AsyncAggregateEvaluator should have some locking or transaction mechanism in place to handle certain scenarios.
Within the BaseAggregators I see two aggregators that might have unexpected async behavior. This being GroupConcat and Sample, GroupConcat
because the aggregator is defined by an order. Sample
forms a problem to because it will return the first term.
I don't know how we can handle the unexpected behavior of GroupConcat
. Either we throw an error if we notice multiple puts are called at the same time or, we expect the user to await every put.
To handle all these problems we could also expect the user to use await between every call to the AyncAggregateEvaluator
but this raises the question on why to make one in the first place?
I'm open to suggestions on how to tackle these problems.
Result should be async
👍 Since everything else in Comunica is async, this shouldn't be a problem.
Either we throw an error if we notice multiple puts are called at the same time
We definitely do not want this
we expect the user to await every put.
Sounds reasonable to me
this raises the question on why to make one in the first place
Not sure what you mean by this
So, we expect a user using the aggregates GroupConcat
and Sample
are responsible for using await between every put so the behavior of these aggregators is as expected?
Than all that's left is finding a way to use transactions for the mentioned functions.
for using await between every put
No, I'd make put
async
, and await
it when it's invoked.
I don't get that. Assuming we use a transaction system on put so only one put is can be executed at a time, a user writing
await Promise.all([aggreg.put(a), aggreg.put(b), aggreg.put(c)]);
const res = await aggreg.result();
where aggreg
is a GroupConcat
migt expect a, b, c
. But, since it is not clear what will exectute first it might be b, a, c
. So than if a user wants a, b, c
. We can expect him to write
aggreg.put(a);
aggreg.put(b);
aggreg.put(c);
right?
I think this will be required:
await aggreg.put(a);
await aggreg.put(b);
await aggreg.put(c);
Yes, of course, I forgot the awaits.
After further research about how JS works and finding out that it doesn't have the concurrency problems other languages have. I need to revise my previous message.
This issue has been handled in the pull requests. I will close it.
@wschella I was wondering why
AgragateEvaluator
usesSyncEvaluatorConfig
. I was trying to make sure the custom-function implementation for #16 was usable in a bigger project like comunica. Within this project almost everything is async, so it wouldn't make any sense to ask a developer for sync extension functions. However, the project uses sparqlee'sAggragateEvaloator
which is build apon theSyncEvaluator
. Maybe we should make anAsyncAggragateEvaloator
for completeness?