MarquezProject / marquez

Collect, aggregate, and visualize a data ecosystem's metadata
https://marquezproject.ai
Apache License 2.0
1.78k stars 320 forks source link

Column lineage query returning null for namespace and name #2875

Closed mattwparas closed 3 months ago

mattwparas commented 3 months ago

It looks like with this change #2821, the column lineage query returns null values, which causes the get dataset endpoint to 500 since namespace and name are marked as non null. I've tested with the old query and it returns the column lineage properly. I have not yet investigated as to why

davidsharp7 commented 3 months ago

Have you got some examples we could look at?

mattwparas commented 3 months ago

Yeah, I can paste some examples - its a bit difficult since we've got quite a bit of data so I'll try to distill it down to something manageable

sophiely commented 3 months ago

Hi all !

I face the same issue here. I feel like it's because the namespace and name are provided thanks to the datasets_view but this view only contains the latest version of a dataset. Sometimes the column lineage is created by an ulterior version so this version won't be find if the datasets_view, I fix the query so as we read in the datasets_version table and not the dataset_views. Let me know what you think :)