MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata
https://marquezproject.ai
Apache License 2.0
1.78k stars 319 forks source link

Design document is not publicly accessible #73

Closed teabot closed 5 years ago

teabot commented 6 years ago

I arrived here after viewing this (excellent) presentation. I'm very keen to understand Marquez in more detail as it appears to align with many of my metadata goals. It' be great to have some visibility on the design/roadmap of the project. I believe that the Google document linked to in the README might contain useful information but do not currently have permissions to view it. On following the link I see: You need permission.

wslulciuc commented 6 years ago

@teabot : It's great to hear that you're interested in learning more about Marquez! The design doc linked in the README was originally authored internally at WeWork, then shared publicly. If you shoot me an email at willy.lulciuc@wework.com I'd be happy to get you access to the doc. And given that the project is in the early development phase, it'd be great to get your feedback as well. Note that there are gaps in the design doc and sections that need to be revisited, but updates to the doc will happen more regularly.

We are currently working on building metadata collection as a core requirement into all jobs (streaming or batch) at WeWork. Are immediate focus is to integrate Marquez with Airflow in order to capture the job (=task) runtime arguments, input/output datasets and state (RUNNING, COMPLETED, etc). This will help define both the Job API and Dataset API.

We have an internal roadmap and milestones for Marquez at WeWork, but our goal is to be transparent about the project and it's direction. I just opened issue #74

wslulciuc commented 5 years ago

@teabot : We have made our design public! see https://marquezproject.github.io/marquez