MarquezProject / marquez-web

Marquez Web UI
23 stars 6 forks source link

Define popular datasets #63

Closed jhubley closed 3 years ago

jhubley commented 4 years ago

Right now, on load, a "popular datasets" label shows, but this isn't actually defined-- all the datasets load. So we need to calculate the number of edges for each, and then filter the datasets so we're only showing those that have an edge count in the top ten percent.

wslulciuc commented 4 years ago

Thanks for opening this issue, @jhubley! Your approach makes sense. We have been discussing authoring a Lineage API proposal for Marquez which can answer questions like: What are the top N datasets? For example, using a combination of arguments filter and limit to return a list of top datasets:

GET /api/v1/lineage?filters=top&limit=10
{
  "datasets": [ ... ]
}

I'm cool with adding the "top datasets" logic to the UI as a short term solution (that way this feature isn't blocked). Let me know your thoughts!

jhubley commented 4 years ago

Awesome, Lineage API sounds excellent. I'll implement the 'top datasets' logic for now because it's not much work. Thanks!