Simplify/remove eventdaemon run requirements

nickygerritsen commented 5 years ago

Currently, the eventdaemon needs to be started at a specific time: after the contest has been created and the initial problem data has been loaded, but before any submission is received or the contest is started. Also it needs to run during state changes, like contest start, freeze, end, unfreeze and finalize. If it is not running at these specific times, downstream tools will not work (correctly). This is because of two reasons:

Upon startup of the daemon we insert create (or update) events for all 'static' data. If these are missing, downstream clients will not know of these when receiving other events that contain references to them. Note: it seems the CDS handles these cases by keeping track of what is still missing, but we should maybe not depend on this.
The eventdaemon checks the contest state every 50.000 microseconds and compares it to the previous state. If it changed, it will send out a state event.

If starting the event daemon at a later / 'wrong' point, we have the problem that events are currently internally ordered by their eventid, which is an auto-incrementing integer. This means we can not insert events at a later time somewhere that is not the end of the feed. If we could insert events at arbitrary points into the event feed, restarting / reloading any downstream clients should be enough to pick up the changes. Then implementing the two types of events the eventdaemon emits would be easy:

Instead of inserting the create / update events at the end of the feed, insert them at the beginning. Note: we still need some logic to determine if we actually need to insert them, especially if later this data changed.
Upon startup check the current contest state and determine which state events should already have happened. Insert all missing ones at the correct spot.

@tuupke told me about Sterm Brocot sequences and that we could maybe solve it with that, but I'm not sure yet how.

Any thoughts are welcome.

nickygerritsen commented 5 years ago

@tuupke told me about Sterm Brocot sequences and that we could maybe solve it with that, but I'm not sure yet how.

Ah, the Mediants and binary search thing seems what we need to generate a new number between two other numbers. And then we order by the fractions, right @tuupke ?

tuupke commented 5 years ago

Yes! You can generate the "normal" sequence in a few ways (easiest is: default either numerator or denominator to 1 and an autoincrement to the other), and then if you want to add something in between two entries, you add up both the numerators and denominators and you will end up with a fraction that lies between those two.

For instance, placing an element between 1/3 and 2/3 would then give (1+2)/(3+3) = 3/6 as a fraction, and 1/3 < 3/6 = 1/2 < 2/3. There is an edge case, where you generate a fibonacci sequence, but you can rebalance part of the sequence (or reduce fractions) if it turns out to be an issue

meisterT commented 5 years ago

Whenever you want to insert an event you would also want all clients to reread the feed from scratch. The CDS solves that issue by bouncing and not reusing any old id. Is there a plan for this on our side?

nickygerritsen commented 5 years ago

Whenever you want to insert an event you would also want all clients to reread the feed from scratch. The CDS solves that issue by bouncing and not reusing any old id. Is there a plan for this on our side?

I didn't know it didn't reuse any ID's. I don't totally get how the CDS does this automatically to downstream clients. It needs to tell clients 'don't use anything you got so far' right?

Anyway, when inserting an event we could just restart nginx, PHP FPM or Apache, making all clients reconnect. However that would indeed reuse existing ID's. However, if the newly inserted events would get new ID's, would that be a problem? Maybe it would because if clients request it with ?since_id=x they will not get it.

What we can do, is somewhere keep track of a 'random' value, for example in the contest table, let's call it event_bounce_count. Then we prefix all event ID's in the API with that ID (and a dash) and we increment it with one every time we bounce.

A simpler solution compared to Sterm Brocot might be:

Have a primary_id (unsinged int) and secondary_id (signed int) field for events, where primary_id is auto increment by default, while secondary_id defaults to 0.
When you want to insert an event before another event, decrement its secondary_id but keep its primary_id. Similar for after

This would mean we can only do this once for every primary_id, secondary_id combination, but maybe that is all that we need? These 5 state events need to be inserted exactly once and the initial create events as well, unless you restart the eventdaemon later. I'm not sure if we can fix that case easily.

meisterT commented 5 years ago

I didn't know it didn't reuse any ID's. I don't totally get how the CDS does this automatically to downstream clients. It needs to tell clients 'don't use anything you got so far' right?

If you reset the CDS, the clients will reconnect. They will connect with since_id param (or similar) and if that event ID is not known to the CDS it'll throw an error. Thus the client side cannot accidentally continue reading.

nickygerritsen commented 5 years ago

So then a event_bounce_count on our side would indeed work, right?

meisterT commented 5 years ago

All of this sounds quite complex. I think there must be an easier solution.

If we would have the "start the event daemon at any time" feature, then we only have to

fix the code
stop the daemon
wipe old events
restart the daemon

in order to achieve the same thing without any additional columns or two column IDs.

eldering commented 5 years ago

Instead of focussing on more complicated solutions that involve fractional or tuple IDs (which I don't object to in principle), I would prefer to first see if we can make (almost) sure that the event feed is uptodate. That would also save us from having to do server restarts and letting clients reread from scratch, which I think not an ideal solution. For example, to insert the state change events, we could pre-calculate those that still need to be inserted, and then cheaply query if one needs to be inserted before each event insert. That could mean that the state change event is generated some time after it actually happened, but it would still show up in the right order in the feed. Also, the create events are already partially generated, and we could do (a lot) better there. For example, when we create a contest, we can add all existing teams to that contest. Then later, if a new team is created (or added/removed from a private contest), we can explicitly insert the associated event.

nickygerritsen commented 5 years ago

All of this sounds quite complex. I think there must be an easier solution.

I hope there is...

If we would have the "start the event daemon at any time" feature, then we only have to

fix the code

stop the daemon

wipe old events

restart the daemon

in order to achieve the same thing without any additional columns or two column IDs.

I don't quite follow. You want to re-insert all events upon restart of the eventdaemon? So also the submissions, judgements, runs and clarifications? That might work if we then also insert past state events. We could even add a flag --clear to clear the event table automatically. I don't really get what you mean with 'fix the code'?

Instead of focussing on more complicated solutions that involve fractional or tuple IDs (which I don't object to in principle), I would prefer to first see if we can make (almost) sure that the event feed is uptodate. That would also save us from having to do server restarts and letting clients reread from scratch, which I think not an ideal solution.

I agree, but I'm not sure if we can.

For example, to insert the state change events, we could pre-calculate those that still need to be inserted, and then cheaply query if one needs to be inserted before each event insert. That could mean that the state change event is generated some time after it actually happened, but it would still show up in the right order in the feed.

Statistics on 'some time' for the world finals:

started: the first event after it happened 7 minutes and 21 seconds after contest start (a clarification).
frozen: less than one second after freeze.
ended: 30 ms after end.
finalized: no event happened after this, except for the thawed one.
thawed: this event is not in the feed. It did happen but we saved the feed before.

For started this is quite a big gap and I think this is something that is quite common. However, this might be a good idea for a 'fallback' solution if the event daemon is not running? To make sure the event is at least always there. I am not sure what exactly the 'pre-calculation' would be, because to me it seems it is not too much work to calculate this on the fly?

Also, the create events are already partially generated, and we could do (a lot) better there. For example, when we create a contest, we can add all existing teams to that contest. Then later, if a new team is created (or added/removed from a private contest), we can explicitly insert the associated event.

This would solve the created events indeed, but this is also quite some work. For example, adding a testcase would also result in a new event. However, this does feel like it is a real solution to the created events.

I am wondering whether it would be enough to:

Have only one event daemon that autodetects contests when they are created and activated, such that you can start it earlier, thus removing half of the 'you need to start it at a specific time' issue.
Implement #294 so we know a daemon is running.
Tell people that if they want referential integrity they should add the CDS after DOMjudge, which will take care if this
Implement @eldering his idea for the state events for when someone forgot to start the eventdaemon

This would require only minor changes, but does require us to have a CDS for referential integrity. One other option might be to make inserting events somehow smarter, by making sure it also inserts dependant events first.

meisterT commented 5 years ago

I think that Jaap is on the right track and we should try to get rid of the separate event daemon process. Relying on the CDS is even worse than this separate event daemon process, IMHO.

But I've good news :-). If currently no client is listening, it doesn't matter if we insert the state change later than the time where it actually happened (and we should do the cheap query that Jaap proposed) to maintain the correct order. If a client is listening to the feed, we have an already running process \o/ and could just do the static event creation right before the while loop and the state checks in the while loop: https://github.com/DOMjudge/domjudge/blob/master/webapp/src/DOMJudgeBundle/Controller/API/ContestController.php#L312

nickygerritsen commented 5 years ago

This sounds super good and I think also pretty easy, right? Let’s do it!

eldering commented 5 years ago

Well, it solves half the problem: state change events. Ideally, we'd also make the eventdaemon superfluous for initial create events. But anyways, we can already start implementing this.

For the state change events, we essentially have to move the code from the eventdaemon to the event-feed endpoint loop. But I would want to make sure that whatever SQL queries we do there are highly cacheable and fast. Also, we have to make sure that we never insert duplicate state change events from multiple client threads.

meisterT commented 5 years ago

I would also do the initial create events in the same way. As soon as the first client connects to a new contest, create them. If no client (ever) connects, create them on the first "dynamic" event.

nickygerritsen commented 5 years ago

These initial events are quite heavy to check on every dynamic event, I think. Do you have an idea how to do this efficiently?

nickygerritsen commented 5 years ago

Also, we have to make sure that we never insert duplicate state change events from multiple client threads.

We can do this with an sql lock I guess if we decide to actually insert it, as a double check. I’d not do the lock every time as that might be slow

eldering commented 5 years ago

Also, we have to make sure that we never insert duplicate state change events from multiple client threads.

We can do this with an sql lock I guess if we decide to actually insert it, as a double check. I’d not do the lock every time as that might be slow

Yes, we can actually make this an advisory lock (not sure that's what you meant too), so it doesn't actually lock any tables, like in the calcscorerow function.

meisterT commented 5 years ago

These initial events are quite heavy to check on every dynamic event, I think. Do you have an idea how to do this efficiently?

I assume (haven't thought it fully through yet) that you can check for referential integrity of this specific event before insertion. If that integrity is not given, do the heavy thing (aka insert everything static).

nickygerritsen commented 5 years ago

I assume (haven't thought it fully through yet) that you can check for referential integrity of this specific event before insertion. If that integrity is not given, do the heavy thing (aka insert everything static).

That might work indeed. Referential integrity check is like 4 events max I think, so that seems doable

DOMjudge / domjudge

Simplify/remove eventdaemon run requirements #552