NangoHQ / nango

A single API for all your integrations.
https://www.nango.dev
Other
4.33k stars 382 forks source link

fix: /api/v1/sync is slow #2424

Closed TBonnin closed 4 days ago

TBonnin commented 5 days ago

The subquery to get the latest job for each sync can slow down the db query, especially when there are a lot of jobs for a given sync With this commit we are now using a left join coupled with ROW_NUMBER to get the latest job

query plan before

Sort  (cost=13349.24..13349.24 rows=1 width=345)
  Sort Key: _nango_syncs.name
  ->  Group  (cost=5504.46..13349.23 rows=1 width=345)
        Group Key: _nango_syncs.id, _nango_sync_schedules.frequency, _nango_active_logs.log_id, _nango_sync_schedules."offset", _nango_sync_schedules.status, _nango_sync_schedules.schedule_id, _nango_sync_configs.models
        ->  Sort  (cost=5504.46..5504.47 rows=1 width=271)
              Sort Key: _nango_syncs.id, _nango_sync_schedules.frequency, _nango_active_logs.log_id, _nango_sync_schedules."offset", _nango_sync_schedules.status, _nango_sync_schedules.schedule_id, _nango_sync_configs.models
              ->  Nested Loop  (cost=1.27..5504.45 rows=1 width=271)
                    ->  Nested Loop  (cost=0.85..5501.81 rows=1 width=271)
                          Join Filter: ((_nango_syncs.name)::text = (_nango_sync_configs.sync_name)::text)
                          ->  Nested Loop Left Join  (cost=0.85..7.53 rows=1 width=224)
                                ->  Nested Loop  (cost=0.57..5.02 rows=1 width=204)
                                      ->  Index Scan using idx_connectionid_name_where_deleted on _nango_syncs  (cost=0.28..2.50 rows=1 width=87)
                                            Index Cond: (nango_connection_id = 35050)
                                      ->  Index Scan using nango_sync_schedules_sync_id_deleted_index on _nango_sync_schedules  (cost=0.29..2.51 rows=1 width=133)
                                            Index Cond: ((sync_id = _nango_syncs.id) AND (deleted = false))
                                ->  Index Scan using idx_sync_id_active_true on _nango_active_logs  (cost=0.28..2.50 rows=1 width=36)
                                      Index Cond: (sync_id = _nango_syncs.id)
                                      Filter: ((type)::text = 'sync'::text)
                          ->  Seq Scan on _nango_sync_configs  (cost=0.00..5494.23 rows=4 width=67)
                                Filter: ((NOT deleted) AND active AND ((type)::text = 'sync'::text) AND (nango_config_id = 2696))
                    ->  Index Only Scan using _nango_connections_pkey on _nango_connections  (cost=0.41..2.63 rows=1 width=4)
                          Index Cond: (id = 35050)
        SubPlan 1
          ->  Limit  (cost=7844.74..7844.74 rows=1 width=40)
                ->  Sort  (cost=7844.74..7854.20 rows=3786 width=40)
                      Sort Key: _nango_sync_jobs.updated_at DESC
                      ->  Nested Loop  (cost=0.86..7825.81 rows=3786 width=40)
                            ->  Index Scan using idx_jobs_id_status_type_where_delete on _nango_sync_jobs  (cost=0.56..4683.52 rows=4190 width=116)
                                  Index Cond: (sync_id = _nango_syncs.id)
                            ->  Memoize  (cost=0.30..1.56 rows=1 width=54)
                                  Cache Key: _nango_sync_jobs.sync_config_id
                                  Cache Mode: logical
                                  ->  Index Scan using _nango_sync_configs_pkey on _nango_sync_configs _nango_sync_configs_1  (cost=0.29..1.55 rows=1 width=54)
                                        Index Cond: (id = _nango_sync_jobs.sync_config_id)
                                        Filter: (NOT deleted)

query plan after

Subquery Scan on data  (cost=10351.61..10364.57 rows=4 width=353)
  Filter: (data.job_row_number = 1)
  ->  Sort  (cost=10351.61..10353.77 rows=864 width=377)
        Sort Key: _nango_syncs.name
        ->  WindowAgg  (cost=10287.87..10309.47 rows=864 width=377)
              Run Condition: (row_number() OVER (?) <= 1)
              ->  Sort  (cost=10287.87..10290.03 rows=864 width=406)
                    Sort Key: _nango_sync_jobs.sync_id, _nango_sync_jobs.created_at DESC
                    ->  Nested Loop Left Join  (cost=1.83..10245.73 rows=864 width=406)
                          ->  Nested Loop  (cost=1.27..5520.40 rows=1 width=274)
                                ->  Nested Loop  (cost=0.85..5517.76 rows=1 width=274)
                                      Join Filter: ((_nango_syncs.name)::text = (_nango_sync_configs.sync_name)::text)
                                      ->  Nested Loop Left Join  (cost=0.85..7.53 rows=1 width=224)
                                            ->  Nested Loop  (cost=0.57..5.02 rows=1 width=204)
                                                  ->  Index Scan using idx_connectionid_name_where_deleted on _nango_syncs  (cost=0.28..2.50 rows=1 width=87)
                                                        Index Cond: (nango_connection_id = 35050)
                                                  ->  Index Scan using nango_sync_schedules_sync_id_deleted_index on _nango_sync_schedules  (cost=0.29..2.51 rows=1 width=133)
                                                        Index Cond: ((sync_id = _nango_syncs.id) AND (deleted = false))
                                            ->  Index Scan using idx_sync_id_active_true on _nango_active_logs  (cost=0.28..2.50 rows=1 width=36)
                                                  Index Cond: (sync_id = _nango_syncs.id)
                                                  Filter: ((type)::text = 'sync'::text)
                                      ->  Seq Scan on _nango_sync_configs  (cost=0.00..5510.18 rows=4 width=70)
                                            Filter: ((NOT deleted) AND active AND ((type)::text = 'sync'::text) AND (nango_config_id = 2696))
                                ->  Index Only Scan using _nango_connections_pkey on _nango_connections  (cost=0.41..2.63 rows=1 width=4)
                                      Index Cond: (id = 35050)
                          ->  Index Scan using idx_jobs_id_status_type_where_delete on _nango_sync_jobs  (cost=0.56..4683.52 rows=4181 width=132)
                                Index Cond: (sync_id = _nango_syncs.id)
                                Filter: (sync_config_id IS NOT NULL)

Issue ticket number and link

https://linear.app/nango/issue/NAN-1307/connection-long-loading-time

Checklist before requesting a review (skip if just adding/editing APIs & templates)

linear[bot] commented 5 days ago

NAN-1307 Connection long loading time