NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
14 stars 0 forks source link

add optional left_columns arg #712

Closed fvankrieken closed 2 months ago

fvankrieken commented 2 months ago

To specify columns, can do so like

WITH a AS (
    ...
),
clipped_to_nyc AS (
    {{ clip_to_geom(left=a, left_by="wkb_geometry", left_columns=["variable_type", "variable_id"]) }}
)
SELECT * FROM clipped_to_nyc
sf-dcp commented 2 months ago

nice! Have you given it a try?

Also, if you have "geom" column in either of the original tables, will they be also present in the final result? Meaning, we would end up with 2+ geom columns?

fvankrieken commented 2 months ago

I did use it.

Nothing from the right table is selected, and that's why there's the {{ dbt_utils.star(from=left, except=[left_by]) }} when no columns are supplied, so that the left geom column is not selected/duplicated

fvankrieken commented 2 months ago

You know, I compiled a model using it, not ran. Let me run it with a CTE

sf-dcp commented 2 months ago

I did use it.

Nothing from the right table is selected, and that's why there's the {{ dbt_utils.star(from=left, except=[left_by]) }} when no columns are supplied, so that the left geom column is not selected/duplicated

Got it. In the second part of the logic below, it seems like we will be selecting left_by column, no?

        {% else %}
            {% for column in left_columns %}
                left.{{ column }},
            {% endfor %}
fvankrieken commented 2 months ago

That's iterating through the argument explicitly provided for selecting by the user, so I'm just going to trust people to select what they need in that case

fvankrieken commented 2 months ago

Caught a little bug, works now

sf-dcp commented 2 months ago

@fvankrieken, I tried to use the macro in my code like this and I'm getting a syntax error:


WITH reprojected AS ( ...
),

clipped_to_nyc AS (
    {{ clip_to_geom(left=reprojected, left_by="geom", left_columns=["objectid"]) }}
),

Here is the compiled query:

reprojected AS (...
),

clipped_to_nyc AS (
    SELECT
         .objectid,
         ST_INTERSECTION(.raw_geom, "db-green-fast-track"."sf_gft_shadows"."stg__nyc_boundary".geom) AS geom
    FROM 
    INNER JOIN "db-green-fast-track"."sf_gft_shadows"."stg__nyc_boundary" 
    ON ST_INTERSECTS(.raw_geom, "db-green-fast-track"."sf_gft_shadows"."stg__nyc_boundary".geom)

),

So the compiled query doesn't know the CTE name. Do you know what I'm doing wrong? You said it worked on your CTE

sf-dcp commented 2 months ago

Nevermind, solved the issue! The CTE name should be in quotes when calling the macro