Nooul / spring-boot-rest-api-helpers

Java Spring REST API helpers for quick query building through JSON inspired by react-admin and offering an alternative to RSQL / FIQL
MIT License
36 stars 21 forks source link

Performance problem with pagination. #21

Closed billk97 closed 2 years ago

billk97 commented 2 years ago

The library seems to have a performance problem with large tables. While the actual query is fast by it self, in order to serve pageable data it has to know the total amount of results. To achieve that it performs the bellow query, which is extremely slow 36s in the below example.

select distinct count(distinct message0_.id) as col_0_0_ from message message0_ inner join application_user applicatio1_ on message0_.created_by_id=applicatio1_.id where applicatio1_.id=5

The problem seems to be in the distinct which is added to the query. By simply removing the distinct as shown bellow the performance improvement was significant. Total table rows: 2.300.000 time with distinct: 32sec time without distinct: 0,9sec

select count(message0_.id) as col_0_0_ from message message0_ inner join application_user applicatio1_ on message0_.created_by_id=applicatio1_.id where applicatio1_.id=5

Sample image of the transactions as shown in our monitoring tool. image

zifnab87 commented 2 years ago

yeah but is the count correct for all the queries if distinct is removed? Also you removed both distinct in count column and outside.. that for sure has impact on correctness of total results due to possible duplicates also the filtering may be done on tables that dont have proper indices (do you have on created_id ?) ? Message id is uuid or incremental int? I will have a look when possible

On Wed, Feb 23, 2022, 10:35 Vasilis Konstantinou @.***> wrote:

The library seems to have a performance problem with large tables. While the actual query is fast by it self, in order to serve pageable data it has to know the total amount of results. To achieve that it performs the bellow query, which is extremely slow 36s in the below example.

select distinct count(distinct message0_.id) as col_00 from message message0_ inner join applicationuser applicatio1 on message0_.created_byid=applicatio1.id where applicatio1_.id=5

The problem seems to be in the distinct which is added to the query. By simply removing the distinct as shown bellow the performance improvement was significant. Total table rows: 2.300.000 time with distinct: 32sec time without distinct: 0,9sec

select count(message0_.id) as col_00 from message message0_ inner join applicationuser applicatio1 on message0_.created_byid=applicatio1.id where applicatio1_.id=5

Sample image of the transactions as shown in our monitoring tool. [image: image] https://user-images.githubusercontent.com/33226904/155290583-bd8f9041-6a5c-43af-9f07-1ca4f1c6d53c.png

— Reply to this email directly, view it on GitHub https://github.com/Nooul/spring-boot-rest-api-helpers/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEAN3TQNZMTPHHA7I2NGSTU4SS6FANCNFSM5PDZO2CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

billk97 commented 2 years ago

You are right, just removing the distinct will introduce duplicates in some scenarios. This can be easily demonstrated in your tests. However, depending on the use case the data could already be distinct for example a UUID or an incremental id in the DB. I added a pull request that will allow disabling the distinct in such scenarios. Please review it and comment if needed.
Examples: normal query GET /actors?filter={movies: 1, firstName: John} disabling distinct query GET /actors?filter={movies: 1, firstName: John, disableDistinct: true} Do you think there is a better way to disable the distinct?

zifnab87 commented 2 years ago

if the indices are right the mysql optimizer should have the same performance with or without distinct so you may need to look at the combination of indices. Also if it is uniqueness on uuid it will be way slower than ints since comparison is so much slower to see for uniqueness. I will have a look in the pull request. Thanks

Στις Τετ 23 Φεβ 2022 στις 1:48 μ.μ., ο/η Vasilis Konstantinou < @.***> έγραψε:

You are right, just removing the distinct will introduce duplicates in some scenarios. This can be easily demonstrated in your tests. However, depending on the use case the data could already be distinct for example a UUID or an incremental id in the DB. I added a pull request that will allow disabling the distinct in such scenarios. Please review it and comment if needed. Examples: normal query GET /actors?filter={movies: 1, firstName: John} disabling distinct query GET /actors?filter={movies: 1, firstName: John, disableDistinct: true} Do you think there is a better way to disable the distinct?

— Reply to this email directly, view it on GitHub https://github.com/Nooul/spring-boot-rest-api-helpers/issues/21#issuecomment-1048701343, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEAN3S7C7W5OBJKSYZCLO3U4TCSJANCNFSM5PDZO2CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

zifnab87 commented 2 years ago

closed as changes were merged by #22