Open k-popov opened 4 months ago
Got another facts making me think this is because the dataset is a combination of DBT and PostgreSQL. Below is the screenshot of "combined" view:
Meanwhile this is the same table in DBT:
And the same in PostgreSQL:
So for some of the fields tags are the following: | DBT | PostgreSQL | Combined | |
---|---|---|---|---|
id | PDN | PDN | ||
call_break | PDN | NoPDN | PDN, NoPDN | |
call_tries_count | NoPDN | NoPDN | NoPDN, NoPDN |
Notice that tags in "combined" view (not sure how to call it correctly) are rather "concatenated" than "merged". If the same tag (not only same text but also same URN) is set in both, the "combined" contains them both.
For different tags (call_break
column) this seems to be correct behaviour (though semantically it's not correct which is the data markup mistake). But if the tags are the same, only one should be left.
Posted a suggested workaround for the issue: #10964 . It does the job, no duplicate tags are shown but may probably break something else. It requires a review (or may be even rewrite) from a person more familiar with this part of datahub.
Describe the bug When viewing dataset of DBT + PostgreSQL (ingested separately and linked together over
platform_instance
recipe option) the same tag is displayed twice.Clicking on either of
No PDN
tags causes datahub to request the same URN for tag (checked with browser inspector tool). Also checking response forgetDataset
graphql request issued by the page also shows that there is only one tag assigned to the column:Datahub 0.13.2 running in Kubernetes.
To Reproduce Steps to reproduce the behavior:
platform_instance
(PostgreSQL set up in UI, DBT is CLI) and ingest corresponding dataExpected behavior Tag is displayed only once
Screenshots
Desktop (please complete the following information):