The subquery to get the latest job for each sync can slow down the db query, especially when there are a lot of jobs for a given sync
With this commit we are now using a left join coupled with ROW_NUMBER to get the latest job
query plan before
Sort (cost=13349.24..13349.24 rows=1 width=345)
Sort Key: _nango_syncs.name
-> Group (cost=5504.46..13349.23 rows=1 width=345)
Group Key: _nango_syncs.id, _nango_sync_schedules.frequency, _nango_active_logs.log_id, _nango_sync_schedules."offset", _nango_sync_schedules.status, _nango_sync_schedules.schedule_id, _nango_sync_configs.models
-> Sort (cost=5504.46..5504.47 rows=1 width=271)
Sort Key: _nango_syncs.id, _nango_sync_schedules.frequency, _nango_active_logs.log_id, _nango_sync_schedules."offset", _nango_sync_schedules.status, _nango_sync_schedules.schedule_id, _nango_sync_configs.models
-> Nested Loop (cost=1.27..5504.45 rows=1 width=271)
-> Nested Loop (cost=0.85..5501.81 rows=1 width=271)
Join Filter: ((_nango_syncs.name)::text = (_nango_sync_configs.sync_name)::text)
-> Nested Loop Left Join (cost=0.85..7.53 rows=1 width=224)
-> Nested Loop (cost=0.57..5.02 rows=1 width=204)
-> Index Scan using idx_connectionid_name_where_deleted on _nango_syncs (cost=0.28..2.50 rows=1 width=87)
Index Cond: (nango_connection_id = 35050)
-> Index Scan using nango_sync_schedules_sync_id_deleted_index on _nango_sync_schedules (cost=0.29..2.51 rows=1 width=133)
Index Cond: ((sync_id = _nango_syncs.id) AND (deleted = false))
-> Index Scan using idx_sync_id_active_true on _nango_active_logs (cost=0.28..2.50 rows=1 width=36)
Index Cond: (sync_id = _nango_syncs.id)
Filter: ((type)::text = 'sync'::text)
-> Seq Scan on _nango_sync_configs (cost=0.00..5494.23 rows=4 width=67)
Filter: ((NOT deleted) AND active AND ((type)::text = 'sync'::text) AND (nango_config_id = 2696))
-> Index Only Scan using _nango_connections_pkey on _nango_connections (cost=0.41..2.63 rows=1 width=4)
Index Cond: (id = 35050)
SubPlan 1
-> Limit (cost=7844.74..7844.74 rows=1 width=40)
-> Sort (cost=7844.74..7854.20 rows=3786 width=40)
Sort Key: _nango_sync_jobs.updated_at DESC
-> Nested Loop (cost=0.86..7825.81 rows=3786 width=40)
-> Index Scan using idx_jobs_id_status_type_where_delete on _nango_sync_jobs (cost=0.56..4683.52 rows=4190 width=116)
Index Cond: (sync_id = _nango_syncs.id)
-> Memoize (cost=0.30..1.56 rows=1 width=54)
Cache Key: _nango_sync_jobs.sync_config_id
Cache Mode: logical
-> Index Scan using _nango_sync_configs_pkey on _nango_sync_configs _nango_sync_configs_1 (cost=0.29..1.55 rows=1 width=54)
Index Cond: (id = _nango_sync_jobs.sync_config_id)
Filter: (NOT deleted)
query plan after
Subquery Scan on data (cost=10351.61..10364.57 rows=4 width=353)
Filter: (data.job_row_number = 1)
-> Sort (cost=10351.61..10353.77 rows=864 width=377)
Sort Key: _nango_syncs.name
-> WindowAgg (cost=10287.87..10309.47 rows=864 width=377)
Run Condition: (row_number() OVER (?) <= 1)
-> Sort (cost=10287.87..10290.03 rows=864 width=406)
Sort Key: _nango_sync_jobs.sync_id, _nango_sync_jobs.created_at DESC
-> Nested Loop Left Join (cost=1.83..10245.73 rows=864 width=406)
-> Nested Loop (cost=1.27..5520.40 rows=1 width=274)
-> Nested Loop (cost=0.85..5517.76 rows=1 width=274)
Join Filter: ((_nango_syncs.name)::text = (_nango_sync_configs.sync_name)::text)
-> Nested Loop Left Join (cost=0.85..7.53 rows=1 width=224)
-> Nested Loop (cost=0.57..5.02 rows=1 width=204)
-> Index Scan using idx_connectionid_name_where_deleted on _nango_syncs (cost=0.28..2.50 rows=1 width=87)
Index Cond: (nango_connection_id = 35050)
-> Index Scan using nango_sync_schedules_sync_id_deleted_index on _nango_sync_schedules (cost=0.29..2.51 rows=1 width=133)
Index Cond: ((sync_id = _nango_syncs.id) AND (deleted = false))
-> Index Scan using idx_sync_id_active_true on _nango_active_logs (cost=0.28..2.50 rows=1 width=36)
Index Cond: (sync_id = _nango_syncs.id)
Filter: ((type)::text = 'sync'::text)
-> Seq Scan on _nango_sync_configs (cost=0.00..5510.18 rows=4 width=70)
Filter: ((NOT deleted) AND active AND ((type)::text = 'sync'::text) AND (nango_config_id = 2696))
-> Index Only Scan using _nango_connections_pkey on _nango_connections (cost=0.41..2.63 rows=1 width=4)
Index Cond: (id = 35050)
-> Index Scan using idx_jobs_id_status_type_where_delete on _nango_sync_jobs (cost=0.56..4683.52 rows=4181 width=132)
Index Cond: (sync_id = _nango_syncs.id)
Filter: (sync_config_id IS NOT NULL)
The subquery to get the latest job for each sync can slow down the db query, especially when there are a lot of jobs for a given sync With this commit we are now using a left join coupled with ROW_NUMBER to get the latest job
query plan before
query plan after
Issue ticket number and link
https://linear.app/nango/issue/NAN-1307/connection-long-loading-time
Checklist before requesting a review (skip if just adding/editing APIs & templates)