doctrine / orm

Doctrine Object Relational Mapper (ORM)
https://www.doctrine-project.org/projects/orm.html
MIT License
9.86k stars 2.5k forks source link

Change getCountQuery methods visibility to protected #11400

Open Fabsolute opened 3 months ago

Fabsolute commented 3 months ago

I want to create my version of Doctrine Paginator but need to override the count query for better performance

greg0ire commented 3 months ago

getCountQuery() is used just once, inside count(), so why don't you just override count() instead? :thinking:

Fabsolute commented 3 months ago

getCountQuery() is used just once, inside count(), so why don't you just override count() instead? 🤔

I want to modify the count query generated by doctrine. If I override count it means I have to rewrite the whole getCountQuery function.

greg0ire commented 3 months ago

Ah yes I see. :bulb: May you disclose a bit more about why you need to create your own version of the paginator, and why the performance improvements cannot be contributed back? Are they about what's in the new version?

greg0ire commented 3 months ago

Retargeting to 3.2.x since this is no bugfix.

Fabsolute commented 3 months ago

Retargeting to 3.2.x since this is no bugfix.

Oh sorry! I just did it on the browser without checking the contribution document. So that is why I created for this branch :facepalm:

Ah yes I see. 💡 May you disclose a bit more about why you need to create your own version of the paginator, and why the performance improvements cannot be contributed back? Are they about what's in the new version?

Let me tell you my story. We are using mysql. We have more than 10 tables that have over 30 million rows. And we have filters and lists of these rows. The weird part is here. I couldn't figure out why but a basic count operation takes ~10 seconds. So for the listing page, the filter itself takes around 500ms(limit is salvation here) but every time doctrine paginator wants to count the whole table from scratch.

For a simple solution, I created a patch file for the doctrine paginator and added enableResultCache() function to the end of the query. And it improved the performance really good way. So these changes should handled a properly way.

I can override the count function and do a custom cache mechanism for each pagination. But we already using the doctrine result cache and it would be good to use the same implementation.

greg0ire commented 3 months ago

So a count query without any where clause takes ~10 seconds? Shouldn't you be fixing that instead? Can you post your issue on https://dba.stackexchange.com/ first? I'm curious to see if this is normal, and if it can be fixed at the database level.

Fabsolute commented 3 months ago

So a count query without any where clause takes ~10 seconds? Shouldn't you be fixing that instead? Can you post your issue on https://dba.stackexchange.com/ first? I'm curious to see if this is normal, and if it can be fixed at the database level.

I will do it but I am not the only one who suffers from this issue.

https://twitter.com/arvidkahl/status/1770160932179906863

greg0ire commented 3 months ago

Yeah browsing the web, I just could find some other occurrences of this:

Maybe what you could do if you're OK with approximations is override count() to use the query described in that last page I linked instead, using the DBAL instead of the ORM, because even with a result cache, you're always going to have a user who will have to warmup the cache (unless you do so yourself somehow).

Fabsolute commented 3 months ago

Maybe what you could do if you're OK with approximations is override count() to use the query described in that last page I linked instead

I am okay with approximations(I think caching is a kind of approximation) but we can't get an approximation with filters. Like select count(*) from item where type="digital". This is an imaginary query but a very similar version took around 6 seconds. It is not possible to do an approximation with statistics.

even with a result cache, you're always going to have a user who will have to warmup the cache (unless you do so yourself somehow).

It is a really old system that still running and the users of the system accepted their fate already 😞 So waiting for a warmup is okay for them. The cache of the bigger lists will be around 2-3 hours. So one user will be a little bit suffocated per filter in 2-3 hours is still a lot of lifesaver.

But on the other hand yes this might not be the correct solution. Maybe I should solve this specific problem of the system in our own way.

derrabus commented 3 months ago

getCountQuery() is not an official extension point yet. This PR would change that. In that case, I'd like to see functional tests that cover extending the QB by overriding this method.