AgragateEvaluator uses SyncEvaluator

jitsedesmet commented 3 years ago

@wschella I was wondering why AgragateEvaluator uses SyncEvaluatorConfig. I was trying to make sure the custom-function implementation for #16 was usable in a bigger project like comunica. Within this project almost everything is async, so it wouldn't make any sense to ask a developer for sync extension functions. However, the project uses sparqlee's AggragateEvaloator which is build apon the SyncEvaluator. Maybe we should make an AsyncAggragateEvaloator for completeness?

wschella commented 3 years ago

I think this was originally done because expressions in an aggregate expressions are a lot simpler, e.g. you can not nest EXISTS in there, i.e. only RegularFunctions, not SpecialFunctions. But now I'm not so sure. I can not immediately find it in the spec. So could definitely make sense.

Be aware, aggregation is very statefull. There is some shenanigans in the AggregateEvaluator to provide a nice API (replacing functions on the first call), both outwards and inwards. Take care when implementing an async alternative.

The BaseAggregators should be able to be reused.

jitsedesmet commented 3 years ago

Thank you for the clarification.

jitsedesmet commented 3 years ago

It is clear that an AsyncAggregateEvaluator should have some locking or transaction mechanism in place to handle certain scenarios.

Result should be async. When a user calls put and doesn't await, the output of result will be arbitrary. the result function should wait until all puts are done.
Locking init: since init changes the behaviour of some functions, it should not be called multiple times. We will need some kind of transaction system to make sure it is only called once. Even when put is called multiple times at once.

Within the BaseAggregators I see two aggregators that might have unexpected async behavior. This being GroupConcat and Sample, GroupConcat because the aggregator is defined by an order. Sample forms a problem to because it will return the first term. I don't know how we can handle the unexpected behavior of GroupConcat. Either we throw an error if we notice multiple puts are called at the same time or, we expect the user to await every put.

To handle all these problems we could also expect the user to use await between every call to the AyncAggregateEvaluator but this raises the question on why to make one in the first place?

I'm open to suggestions on how to tackle these problems.

rubensworks commented 3 years ago

Result should be async

👍 Since everything else in Comunica is async, this shouldn't be a problem.

Either we throw an error if we notice multiple puts are called at the same time

We definitely do not want this

we expect the user to await every put.

Sounds reasonable to me

this raises the question on why to make one in the first place

Not sure what you mean by this

jitsedesmet commented 3 years ago

So, we expect a user using the aggregates GroupConcat and Sample are responsible for using await between every put so the behavior of these aggregators is as expected? Than all that's left is finding a way to use transactions for the mentioned functions.

rubensworks commented 3 years ago

for using await between every put

No, I'd make put async, and await it when it's invoked.

jitsedesmet commented 3 years ago

I don't get that. Assuming we use a transaction system on put so only one put is can be executed at a time, a user writing

await Promise.all([aggreg.put(a), aggreg.put(b), aggreg.put(c)]);
const res = await aggreg.result();

where aggreg is a GroupConcat migt expect a, b, c. But, since it is not clear what will exectute first it might be b, a, c. So than if a user wants a, b, c. We can expect him to write

aggreg.put(a);
aggreg.put(b);
aggreg.put(c);

right?

rubensworks commented 3 years ago

I think this will be required:

await aggreg.put(a);
await aggreg.put(b);
await aggreg.put(c);

jitsedesmet commented 3 years ago

Yes, of course, I forgot the awaits.

jitsedesmet commented 3 years ago

After further research about how JS works and finding out that it doesn't have the concurrency problems other languages have. I need to revise my previous message.

I think result shouldn't be async. If we expect a user to do this we might as well expect him to await all puts before calling result. It would be pretty hard to implement a result function that awaits all puts. There would be some questions to be answered like 'what about puts that where called after calling results?' If these questions arise when implementing I don't think users will find it natural to work with either. Making result not async would give a clear signal to the user that he/she needs to await all puts.
The locking mechanism isn't needed since we could just check a simple state to see whether init has already been called.

jitsedesmet commented 3 years ago

This issue has been handled in the pull requests. I will close it.

comunica / sparqlee

AgragateEvaluator uses SyncEvaluator #86