Open shangyian opened 7 months ago
This makes sense. I just have some reservations about the following statement, but let's talk about it before I spell out my concerns:
While we offer the ability for the user to specify RIGHT, LEFT or INNER joins, in practice when generating SQL, all RIGHT joins will be recast as LEFT joins (for performance reasons).
To clarify, when the user specifies RIGHT JOIN
, we'll be swapping the tables from the left hand side to the right hand side and doing a LEFT JOIN
, but in practice this is the same as a right join. This is mainly for performance optimization purposes, where some engines aren't optimized for right joins.
When linking dimensions, there are two types of links: join links and alias/reference links.
Join Link
You can configure a join link between any table/view-like node (source, transform, dimension) and a dimension node. Configuring this join link will make it so that all dimension attributes on the dimension node are available for the original node.
An example of a dimension join link:
ON
clause will just be equality comparisons between the primary key and foreign key columns of the original node and the dimension node. More complex join clauses can be configured if desired.RIGHT
,LEFT
orINNER
joins, in practice when generating SQL, allRIGHT
joins will be recast asLEFT
joins (for performance reasons).Alias/Reference Link
You can configure a dimension alias/reference between a particular column on a table/view-like node (source, transform, dimension) and a column on a dimension node. An example:
In this case, configuring a reference between
default.fact_transform.country_name
anddefault.country.name
will indicate that the semantic meaning behinddefault.fact_transform.country_name
refers to thedefault.country
dimension'sname
field. No join is required here, unless we explicitly create a dimension join between the two nodes.When we have monitoring capabilities, we'll be able to enforce dimension conformity.
Proposed Changes
Add a new backend endpoint for dimension aliasing/referencing
This endpoint can be done at the column-level, as in, for a given node column, the user can tell DJ if the column is meant to reference a particular dimension attribute.
In the database, we can store this using the
dimension_id
anddimension_column
fields on the current column's table. This was previously used for join links, which we now represent via thedimensionlink
table in the database.Expose the join link and the alias link functionalities in the UI
The join link vs alias link functionalities should be exposed separately in the UI, so that it is clearer to the user which type of link they are creating.
Alias links should probably continue to live under the Linked Dimensions column:
Join links can have a separate "create" button from elsewhere in the UI.