MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata
https://marquezproject.ai
Apache License 2.0
1.78k stars 320 forks source link

Migrate `namespace` to be optional in `API` and `WEB` #2876

Open phixMe opened 3 months ago

phixMe commented 3 months ago

Currently, the endpoints for marquez all look along the lines of: http://localhost:5000/api/v1/namespaces/{namespace}/datasets/{dataset}

This means that namespaces are effectively mandatory when users are working with the api and thus the web contains a dropdown allowing users to select a namespace. This is not ideal for a few reasons:

  1. We either need to default a namespace (1st in the list) or force a user to select a namespace before anything else can be rendered.
  2. A user does not always know the namespace of a job or dataset they are looking for because many of the OL integrations (Spark, Airflow, etc...) assign these.
  3. A long list of namespaces is not going to be ideal since a MQ user with thousands of namespaces will need to scroll a long way and have application rendering degraded.

Therefore, we need to handle namespaces a little differently across the board to rectify these problems listed above.

Proposal

Migrate the format of the endpoints to use query parameters like below and then forward old formats to new formats for a few releases (or forever)

# OLD
http://localhost:5000/api/v1/namespaces/{namespace}/datasets/{dataset}
# NEW
http://localhost:5000/api/v1/datasets/{dataset}?namespace={namespace}

Update the web to just render all the jobs and datasets, paginated of course without regard to namespace. Include a namespace filtering mechanism with a typeahead select so that users with many namespaces can find there known namespaces and filter.

wslulciuc commented 1 month ago

0.50.0 release will allow for listing jobs without a namespace via GET /api/v1/jobs, see https://github.com/MarquezProject/marquez/pull/2930. We will be taking a similar approach for datasets.