Closed nickygerritsen closed 5 years ago
@tuupke told me about Sterm Brocot sequences and that we could maybe solve it with that, but I'm not sure yet how.
Ah, the Mediants and binary search thing seems what we need to generate a new number between two other numbers. And then we order by the fractions, right @tuupke ?
Yes! You can generate the "normal" sequence in a few ways (easiest is: default either numerator or denominator to 1 and an autoincrement to the other), and then if you want to add something in between two entries, you add up both the numerators and denominators and you will end up with a fraction that lies between those two.
For instance, placing an element between 1/3
and 2/3
would then give (1+2)/(3+3) = 3/6
as a fraction, and 1/3 < 3/6 = 1/2 < 2/3
. There is an edge case, where you generate a fibonacci sequence, but you can rebalance part of the sequence (or reduce fractions) if it turns out to be an issue
Whenever you want to insert an event you would also want all clients to reread the feed from scratch. The CDS solves that issue by bouncing and not reusing any old id. Is there a plan for this on our side?
Whenever you want to insert an event you would also want all clients to reread the feed from scratch. The CDS solves that issue by bouncing and not reusing any old id. Is there a plan for this on our side?
I didn't know it didn't reuse any ID's. I don't totally get how the CDS does this automatically to downstream clients. It needs to tell clients 'don't use anything you got so far' right?
Anyway, when inserting an event we could just restart nginx, PHP FPM or Apache, making all clients reconnect. However that would indeed reuse existing ID's. However, if the newly inserted events would get new ID's, would that be a problem? Maybe it would because if clients request it with ?since_id=x
they will not get it.
What we can do, is somewhere keep track of a 'random' value, for example in the contest table, let's call it event_bounce_count
. Then we prefix all event ID's in the API with that ID (and a dash) and we increment it with one every time we bounce.
A simpler solution compared to Sterm Brocot might be:
primary_id
(unsinged int) and secondary_id
(signed int) field for events, where primary_id
is auto increment by default, while secondary_id
defaults to 0
.secondary_id
but keep its primary_id
. Similar for afterThis would mean we can only do this once for every primary_id, secondary_id
combination, but maybe that is all that we need? These 5 state events need to be inserted exactly once and the initial create
events as well, unless you restart the eventdaemon later. I'm not sure if we can fix that case easily.
I didn't know it didn't reuse any ID's. I don't totally get how the CDS does this automatically to downstream clients. It needs to tell clients 'don't use anything you got so far' right?
If you reset the CDS, the clients will reconnect. They will connect with since_id
param (or similar) and if that event ID is not known to the CDS it'll throw an error. Thus the client side cannot accidentally continue reading.
So then a event_bounce_count
on our side would indeed work, right?
All of this sounds quite complex. I think there must be an easier solution.
If we would have the "start the event daemon at any time" feature, then we only have to
in order to achieve the same thing without any additional columns or two column IDs.
Instead of focussing on more complicated solutions that involve fractional or tuple IDs (which I don't object to in principle), I would prefer to first see if we can make (almost) sure that the event feed is uptodate. That would also save us from having to do server restarts and letting clients reread from scratch, which I think not an ideal solution. For example, to insert the state change events, we could pre-calculate those that still need to be inserted, and then cheaply query if one needs to be inserted before each event insert. That could mean that the state change event is generated some time after it actually happened, but it would still show up in the right order in the feed. Also, the create events are already partially generated, and we could do (a lot) better there. For example, when we create a contest, we can add all existing teams to that contest. Then later, if a new team is created (or added/removed from a private contest), we can explicitly insert the associated event.
All of this sounds quite complex. I think there must be an easier solution.
I hope there is...
If we would have the "start the event daemon at any time" feature, then we only have to
- fix the code
- stop the daemon
- wipe old events
- restart the daemon
in order to achieve the same thing without any additional columns or two column IDs.
I don't quite follow. You want to re-insert all events upon restart of the eventdaemon? So also the submissions, judgements, runs and clarifications? That might work if we then also insert past state events. We could even add a flag --clear
to clear the event table automatically.
I don't really get what you mean with 'fix the code'?
Instead of focussing on more complicated solutions that involve fractional or tuple IDs (which I don't object to in principle), I would prefer to first see if we can make (almost) sure that the event feed is uptodate. That would also save us from having to do server restarts and letting clients reread from scratch, which I think not an ideal solution.
I agree, but I'm not sure if we can.
For example, to insert the state change events, we could pre-calculate those that still need to be inserted, and then cheaply query if one needs to be inserted before each event insert. That could mean that the state change event is generated some time after it actually happened, but it would still show up in the right order in the feed.
Statistics on 'some time' for the world finals:
started
: the first event after it happened 7 minutes and 21 seconds after contest start (a clarification).frozen
: less than one second after freeze.ended
: 30 ms after end.finalized
: no event happened after this, except for the thawed
one.thawed
: this event is not in the feed. It did happen but we saved the feed before.For started
this is quite a big gap and I think this is something that is quite common. However, this might be a good idea for a 'fallback' solution if the event daemon is not running? To make sure the event is at least always there.
I am not sure what exactly the 'pre-calculation' would be, because to me it seems it is not too much work to calculate this on the fly?
Also, the create events are already partially generated, and we could do (a lot) better there. For example, when we create a contest, we can add all existing teams to that contest. Then later, if a new team is created (or added/removed from a private contest), we can explicitly insert the associated event.
This would solve the created
events indeed, but this is also quite some work. For example, adding a testcase would also result in a new event. However, this does feel like it is a real solution to the created
events.
I am wondering whether it would be enough to:
state
events for when someone forgot to start the eventdaemonThis would require only minor changes, but does require us to have a CDS for referential integrity. One other option might be to make inserting events somehow smarter, by making sure it also inserts dependant events first.
I think that Jaap is on the right track and we should try to get rid of the separate event daemon process. Relying on the CDS is even worse than this separate event daemon process, IMHO.
But I've good news :-). If currently no client is listening, it doesn't matter if we insert the state change later than the time where it actually happened (and we should do the cheap query that Jaap proposed) to maintain the correct order. If a client is listening to the feed, we have an already running process \o/ and could just do the static event creation right before the while loop and the state checks in the while loop: https://github.com/DOMjudge/domjudge/blob/master/webapp/src/DOMJudgeBundle/Controller/API/ContestController.php#L312
This sounds super good and I think also pretty easy, right? Let’s do it!
Well, it solves half the problem: state change events. Ideally, we'd also make the eventdaemon superfluous for initial create events. But anyways, we can already start implementing this.
For the state change events, we essentially have to move the code from the eventdaemon to the event-feed endpoint loop. But I would want to make sure that whatever SQL queries we do there are highly cacheable and fast. Also, we have to make sure that we never insert duplicate state change events from multiple client threads.
I would also do the initial create events in the same way. As soon as the first client connects to a new contest, create them. If no client (ever) connects, create them on the first "dynamic" event.
These initial events are quite heavy to check on every dynamic event, I think. Do you have an idea how to do this efficiently?
Also, we have to make sure that we never insert duplicate state change events from multiple client threads.
We can do this with an sql lock I guess if we decide to actually insert it, as a double check. I’d not do the lock every time as that might be slow
Also, we have to make sure that we never insert duplicate state change events from multiple client threads.
We can do this with an sql lock I guess if we decide to actually insert it, as a double check. I’d not do the lock every time as that might be slow
Yes, we can actually make this an advisory lock (not sure that's what you meant too), so it doesn't actually lock any tables, like in the calcscorerow function.
These initial events are quite heavy to check on every dynamic event, I think. Do you have an idea how to do this efficiently?
I assume (haven't thought it fully through yet) that you can check for referential integrity of this specific event before insertion. If that integrity is not given, do the heavy thing (aka insert everything static).
I assume (haven't thought it fully through yet) that you can check for referential integrity of this specific event before insertion. If that integrity is not given, do the heavy thing (aka insert everything static).
That might work indeed. Referential integrity check is like 4 events max I think, so that seems doable
Currently, the eventdaemon needs to be started at a specific time: after the contest has been created and the initial problem data has been loaded, but before any submission is received or the contest is started. Also it needs to run during state changes, like contest start, freeze, end, unfreeze and finalize. If it is not running at these specific times, downstream tools will not work (correctly). This is because of two reasons:
create
(orupdate
) events for all 'static' data. If these are missing, downstream clients will not know of these when receiving other events that contain references to them. Note: it seems the CDS handles these cases by keeping track of what is still missing, but we should maybe not depend on this.state
event.If starting the event daemon at a later / 'wrong' point, we have the problem that events are currently internally ordered by their
eventid
, which is an auto-incrementing integer. This means we can not insert events at a later time somewhere that is not the end of the feed. If we could insert events at arbitrary points into the event feed, restarting / reloading any downstream clients should be enough to pick up the changes. Then implementing the two types of events the eventdaemon emits would be easy:create
/update
events at the end of the feed, insert them at the beginning. Note: we still need some logic to determine if we actually need to insert them, especially if later this data changed.state
events should already have happened. Insert all missing ones at the correct spot.@tuupke told me about Sterm Brocot sequences and that we could maybe solve it with that, but I'm not sure yet how.
Any thoughts are welcome.