Closed iPurpl3x closed 2 years ago
posts.number
is calculated on php/backend side.
Is it possible to calculate it throught the database?
I think we could do this database side. But It'd require the use of a complex compound query to do I believe, I might be able to take a crack at it if work doesn't get crazy.
Well it's not just constructing the query, but also making sure that query automatically gets run while saving, and having ALL of that be DB logic (we can't get the value via query and then save it, we have to do it in one step, otherwise there's no point).
There are two options:
Personally I think a trigger makes most sense, for a migration it would require something like this https://stackoverflow.com/a/55023987/717181
We could also consider https://stackoverflow.com/a/7821630/13019074 as an option. IMO, the less raw statements in migrations, the better.
I don't think locking tables solves it, it will only create larger problems. When tables are locked nothing else can write into it, so instead of a duplicate key exception you will receive a waiting for lock timeout on instances with much activity. Because remember users have to wait on their post to be written to the database.
IMO, the less raw statements in migrations, the better.
Actually, this isn't a dealbreaker: in the migration, we could check whether the DB is mysql or postgres. Depending on the answer, we could either run one raw statement or the other. It's not ideal, but I don't think eloquent has a tool for creating triggers.
Further research suggests that using triggers or other logic inside the database counters the idea of an orm as an abstraction layer of the database. That makes sense. I'd still be open to using triggers though, it feels like the most reliable and performant solution.
We could also consider not using number, because as far as I can tell it has hardly any use in the frontend. I once tried to figure out for @Bokt whether the impact would be detrimental, but never really found an answer. Or perhaps we can generate the number in the serializer only by counting all previous posts of a discussion (which might be slow), make it a property only required on the frontend and use the Id for backend logic...
I would not be against switching to something like GUIDs for IDs if it can be done reasonably easily. In fact using GUIDs is standard practice where I work explicitly to avoid this kind of issue since GUIDs are based on time and randomization basically making it impossible to get two duplicates. (Assuming your using standard GUIDs and not one of the newer special version like the incrementing GUID)
as far as I can tell it has hardly any use in the frontend
It's used in the URL to navigate to a specific post (goToNumber). I suppose we could also use IDs, but that would make URLs messier. Not sure if we care about that though (there's already confusion from time to time when number doesn't match up).
Discourse seems to use a "number"-like system, might be worth investigating how they do it.
Dropping number
would also make merging/splitting discussions much easler.
Or perhaps we can generate the number in the serializer only by counting all previous posts of a discussion (which might be slow), make it a property only required on the frontend and use the Id for backend logic...
This sounds horribly expensive
I would not be against switching to something like GUIDs for IDs
This is a completely orthogonal topic and should be discussed separately.
This is a completely orthogonal topic and should be discussed separately.
Ah yes, my bad, I got confused on exactly what this issue was fixing. Now that I look deeper my thoughts are unrelated.
Having researched a bit about trigger scalability and performance the general consensus seems to be that they are okay ish if you do not use queries that hit/read many rows. As a trigger to update the number on a one million post discussion would do a count on all those rows I fear the result would be an increasing amount of table locks.
As such i was considering the following rough ideas
I bet there are more alternatives but let's first let this sink in.
@bartvb pointed me to insertUsing
which is part of the Eloquent Query Builder which allows inserting a value based on a query. I'm investigating whether this is something that can be used inside a Model too.
@BartVB pointed me to
insertUsing
which is part of the Eloquent Query Builder which allows inserting a value based on a query. I'm investigating whether this is something that can be used inside a Model too.
This is probably going to be the best way to solve this. The major challenge here will be composing insertUsing
with insertGetId
, which is used when creating new instances of models, so their ID can immediately be updated as part of the save
call.
In theory, this shouldn't be too challenging to implement if we create a util class with some helper methods, and test it carefully. I would also break out the functionality into a trait that Post
can use for better modularity. That being said, this feels like a generally useful feature for the Laravel community, so I've opened a discussion over at https://github.com/laravel/framework/discussions/39663.
@askvortsov1 what about this solution?
We abstract the logic of number
calculation into a new class or callable. That invokable class or callable would then receive the Post as an argument. The default logic would use $post->discussion->post_number_index
like before.
An alternative implementation that I'm considering is instead storing the latest number into redis, but if it doesn't exist do one query to retrieve it and set it into redis.
I know that redis also offers a concurrency blocking mechanism that Laravel implements in jobs/scheduled tasks, so that could be another implementation to consider although it would probably require delaying the saving action, eg:
$redis = get_redis();
$lock = $redis->lock('calculate-number-discussion-' . $post->discussion_id, 2);
if ($lock->acquire()) {
// .. logic
$lock->release();
}
Other options:
getSequence()
which gets the last number from a (discussion_id
, number
) table, gets the last value, and inserts a new row with an incremented number until it succeeds. Using a transaction is very similar though but depends on the code how feasible a transaction is.But it seems like insertUsing
is the proper way to do this.
I'll try and play around with some code.
A WIP solution had been started at https://github.com/flarum/framework/pull/3198 prior to the monorepo switch. Overall it seems correct, but there's an odd error when populating the prepared query. I intend to get this fixed for the v1.3 release.
Bug Report
Current Behavior As suggested by @luceos this error happens "when two posts are saved at the same time".
Steps to Reproduce I never saw this error myself, I just found it in the logs of one of our production instances.
Expected Behavior No Error.
Error Logs
Environment