Closed marcomicera closed 4 years ago
As our queries will be mostly reading from non-updating data, we don't need cursors as they will be complicating our lives in terms of session management (which we also don't need). If we can restrict our browsing on indexed columns, I 'd recommend we do keyset pagination and always return ordered results (which actually makes sense).
Alright, doable with JOOQ's seek()
method.
Since we won't be having fixed-size pages, the limit
and offset
query params don't make much sense anymore. I see two ways:
We get rid of them and use a unique query parameter, e.g., page
. I can then emulate pagination by storing page boundaries (i.e., last page values of columns we order by) in a dedicated materialized view that we should then somehow keep updated (immediately with a trigger, deferred with a temporal trigger, or on-demand with a stored procedure).\
Example from that blog post:
select x.value, x.id, row_number() over(order by x.value, x.id) + 1 page_number from ( select t.id, t.value, case row_number() over(order by t.value, t.id) % 5 when 0 then 1 else 0 end page_boundary from t order by t.value, t.id ) x where x.page_boundary = 1
This will then yield:
| VALUE | ID | PAGE_NUMBER | |-------|-----|-------------| | 0 | 786 | 2 | | 1 | 250 | 3 | | 1 | 959 | 4 | | 2 | 229 | 5 | | 2 | 533 | 6 | <-- Before page 6 | 3 | 37 | 7 | | 3 | 768 | 8 |
We keep them and use the aforementioned table to determine the page number given a certain offset
value. Then we limit the result size with the LIMIT
SQL clause.
Both of these solutions require these materialized views that contain page boundaries. Am I missing a simpler solution, maybe?
- We keep them and use the aforementioned table to determine the page number given a certain
offset
value. Then we limit the result size with theLIMIT
SQL clause.
I'm doing this, but with page
and limit
as query parameters.
I was right in the middle of it when I realized that keyset pagination isn't really suited for our case as we're already filtering those entries for package names and versions all the time.
So for instance such a table:
create materialized view packages_page_limits as
select package_name,
id,
row_number()
over (order by package_name, id) + 1 page_number
from (
select id,
package_name,
case row_number()
over (order by package_name, id) % 100
when 0 then 1
else 0
end page_boundary
from packages
order by package_name
) as x
where x.page_boundary = 1;
select * from packages_page_limits limit 10;
package_name | id | page_number
----------------------------------------------------------+----------+-------------
aero.m-click:mcpdf | 526850 | 2
ai.h2o:xgboost4j | 592988 | 3
ai.tock:tock-bot-engine-jackson | 8812137 | 4
android.compatibility:android-support-v7-appcompat | 20842521 | 5
ant:ant-junit | 298684 | 6
app.myoss.cloud.boot:myoss-starter-web | 8548776 | 7
at.bestsolution.eclipse:org.eclipse.fx.ui.animation | 7573640 | 8
at.lux:imageanalysis | 7487539 | 9
au.com.codeka:carrot | 88521 | 10
au.com.redboxresearchdata.fascinator:plugin-sso-rapidaaf | 5939684 | 11
would only help me when returning all packages, if I'm not wrong.
I'm switching back to the classic (yet inconsistent & inefficient) offset-limit.
Materialized views are not a great solution, as they will need to be updated every time we add a new package. Also, for large tables, like callables
and edges
they will just replicate the whole (humongous) table with just an extra column. In sort, let's just do page
and limit
without pre-computing stuff. This is what most APIs do, so it will be more familiar with devs as well.
The biggest package in the current database is com.groupdocs:groupdocs-signature
, version 18.4
(371k callables). Postgres-based offset
/limit
pagination works nicely:
fasten_java=> \timing
fasten_java=> select c.fasten_uri
from package_versions pv join packages p on pv.package_id = p.id
join modules m on m.package_version_id = pv.id
join callables c on c.module_id = m.id
where
c.is_internal_call is true
and p.package_name = 'com.groupdocs:groupdocs-signature'
AND pv.version = '18.4' limit 100 offset 45000;
Time: 365.154 ms
fasten_java=> select c.fasten_uri
from package_versions pv join packages p on pv.package_id = p.id
join modules m on m.package_version_id = pv.id
join callables c on c.module_id = m.id
where
c.is_internal_call is true
and p.package_name = 'com.groupdocs:groupdocs-signature'
AND pv.version = '18.4' limit 100 offset 12000;
Time: 303.800 ms
I would avoid anything more sophisticated until there is a need for it.
It is indeed a bit slower if we order by callable.id
, but not prohibitively so:
fasten_java=> select c.fasten_uri
from package_versions pv join packages p on pv.package_id = p.id
join modules m on m.package_version_id = pv.id
join callables c on c.module_id = m.id
where
c.is_internal_call is true
and p.package_name = 'com.groupdocs:groupdocs-signature'
AND pv.version = '18.4'
order by c.id
limit 100 offset 12000;
Time: 1477.363 ms (00:01.477)
I would avoid anything more sophisticated until there is a need for it.
Yeah, so we agree on this. I dropped the keyset pagination for the offset-limit one https://github.com/fasten-project/fasten/commit/d1b7acba0581acc0efe261bb4149ddbd8f49ef49. If you're ok with this, we can close this issue already.
@romatallinn you now have pagination! Same offset
and limit
query params as in the good old endpoints table. Should something change in the future, you'll be promptly informed!
@gousiosg can we close this?
This is how I see it:
If we don't mind about inconsistency and performance for high offset values, I'd keep it simple and I'd go for the classic limit & offset approach. @gousiosg any take on this?