MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata
https://marquezproject.ai
Apache License 2.0
1.78k stars 320 forks source link

Add col `current_run_uuid` to `jobs` #2929

Closed wslulciuc closed 1 month ago

wslulciuc commented 1 month ago

This PR is a slightly optimized approach replacing https://github.com/MarquezProject/marquez/pull/2928 by adding jobs.current_run_uuid.

SQL Perf

EXPLAIN plan

Limit  (cost=77.73..77.74 rows=5 width=742)
  CTE jobs_view_page
    ->  Hash Left Join  (cost=1.11..2.23 rows=5 width=565)
          Hash Cond: (j_2.parent_job_uuid = p.uuid)
          ->  Seq Scan on jobs j_2  (cost=0.00..1.06 rows=5 width=385)
                Filter: ((is_hidden IS FALSE) AND (symlink_target_uuid IS NULL) AND ((namespace_name)::text = 'namespace1996501068'::text))
          ->  Hash  (cost=1.05..1.05 rows=5 width=29)
                ->  Seq Scan on jobs p  (cost=0.00..1.05 rows=5 width=29)
  CTE job_versions_temp
    ->  Seq Scan on job_versions j_3  (cost=0.00..2.61 rows=49 width=661)
          Filter: ((namespace_name)::text = 'namespace1996501068'::text)
  ->  Sort  (cost=72.89..72.90 rows=5 width=742)
        Sort Key: j.updated_at DESC
        ->  Nested Loop Left Join  (cost=68.61..72.83 rows=5 width=742)
              Join Filter: (j.uuid = jt.uuid)
              ->  Hash Left Join  (cost=8.48..12.24 rows=5 width=710)
                    Hash Cond: (jv.latest_run_uuid = f.run_uuid)
                    ->  Hash Right Join  (cost=1.44..5.20 rows=5 width=694)
                          Hash Cond: (r.uuid = j.current_run_uuid)
                          Filter: (((r.current_run_state)::text = ANY ('{RUNNING,COMPLETED,FAILED}'::text[])) OR (r.uuid IS NULL))
                          ->  Seq Scan on runs r  (cost=0.00..3.50 rows=50 width=24)
                          ->  Hash  (cost=1.38..1.38 rows=5 width=694)
                                ->  Hash Right Join  (cost=0.16..1.38 rows=5 width=694)
                                      Hash Cond: (jv.uuid = j.current_version_uuid)
                                      ->  CTE Scan on job_versions_temp jv  (cost=0.00..0.98 rows=49 width=32)
                                      ->  Hash  (cost=0.10..0.10 rows=5 width=678)
                                            ->  CTE Scan on jobs_view_page j  (cost=0.00..0.10 rows=5 width=678)
                    ->  Hash  (cost=6.97..6.97 rows=5 width=48)
                          ->  Subquery Scan on f  (cost=6.86..6.97 rows=5 width=48)
                                ->  HashAggregate  (cost=6.86..6.92 rows=5 width=48)
                                      Group Key: job_facets.run_uuid
                                      ->  Sort  (cost=6.77..6.79 rows=5 width=56)
                                            Sort Key: job_facets.lineage_event_time
                                            ->  Nested Loop  (cost=0.31..6.72 rows=5 width=56)
                                                  ->  Hash Join  (cost=0.16..1.38 rows=5 width=16)
                                                        Hash Cond: (jv2.uuid = j2.current_version_uuid)
                                                        ->  CTE Scan on job_versions_temp jv2  (cost=0.00..0.98 rows=49 width=32)
                                                        ->  Hash  (cost=0.10..0.10 rows=5 width=16)
                                                              ->  CTE Scan on jobs_view_page j2  (cost=0.00..0.10 rows=5 width=16)
                                                  ->  Index Scan using job_facets_run_uuid_index on job_facets  (cost=0.14..1.06 rows=1 width=56)
                                                        Index Cond: (run_uuid = jv2.latest_run_uuid)
              ->  Materialize  (cost=60.14..60.28 rows=5 width=48)
                    ->  Subquery Scan on jt  (cost=60.14..60.25 rows=5 width=48)
                          ->  HashAggregate  (cost=60.14..60.20 rows=5 width=48)
                                Group Key: j_1.uuid
                                ->  Hash Join  (cost=25.75..54.14 rows=1200 width=48)
                                      Hash Cond: (jtm.tag_uuid = t.uuid)
                                      ->  Hash Join  (cost=1.12..26.34 rows=1200 width=32)
                                            Hash Cond: (jtm.job_uuid = j_1.uuid)
                                            ->  Seq Scan on jobs_tag_mapping jtm  (cost=0.00..22.00 rows=1200 width=32)
                                            ->  Hash  (cost=1.06..1.06 rows=5 width=16)
                                                  ->  Seq Scan on jobs j_1  (cost=0.00..1.06 rows=5 width=16)
                                                        Filter: ((namespace_name)::text = 'namespace1996501068'::text)
                                      ->  Hash  (cost=16.50..16.50 rows=650 width=48)
                                            ->  Seq Scan on tags t  (cost=0.00..16.50 rows=650 width=48)

This PR also updates how the job type is display within the jobs list:

Screenshot 2024-10-17 at 12 19 23 AM
netlify[bot] commented 1 month ago

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
Latest commit caa49cf096d781e034f2d1b387f8bc1b7160bd84
Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/67116955b41c6600082f01a4
codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.13%. Comparing base (05d16aa) to head (caa49cf). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2929 +/- ## ============================================ + Coverage 81.12% 81.13% +0.01% - Complexity 1505 1507 +2 ============================================ Files 268 268 Lines 7358 7364 +6 Branches 330 330 ============================================ + Hits 5969 5975 +6 Misses 1228 1228 Partials 161 161 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.