MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata
https://marquezproject.ai
Apache License 2.0
1.78k stars 320 forks source link

[FIX] Dataset query to get only the latest facet for each version #2859

Closed sophiely closed 4 months ago

sophiely commented 4 months ago

Problem

Closes: https://github.com/MarquezProject/marquez/issues/2860

Solution

Since the same facet type is replicated a lot of times, we can rank the facet partition by dataset version and facet name ands so as we can take only the most recent facet for each dataset uuid and type. The UI seems to display only one facet per type (facet name) and dataset version anyway so we don't need to query as much facet (which are just duplicates anyway).

Checklist

netlify[bot] commented 4 months ago

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
Latest commit 75160fee832d0c8e5627b7f555d53b2093e2dc60
Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/66a28c291668a40007d0014c
codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.75%. Comparing base (879031a) to head (2e76688).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2859 +/- ## ========================================= Coverage 84.75% 84.75% Complexity 1456 1456 ========================================= Files 253 253 Lines 6566 6566 Branches 305 305 ========================================= Hits 5565 5565 Misses 850 850 Partials 151 151 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

dkt-sophie-ly commented 3 months ago

Amazing, thanks, do we have a similar problem elsewhere? Or is this the only instance of this problem that you have seen.

For now i don't think so but ofc I will let you know if I notice similar issues :)