Closed tomasmv closed 1 year ago
This sounds like a config issue. If you're using a custom schema name macro you'll want to set upstream_prod_env_schemas: True
and upstream_prod_schema
to the same schema
name in your production target.
You only need to set upstream_prod_database
if your dev and prod environments are in different databases. If they are, this should be set to your prod database. You mention setting it to stg
- is this definitely your production database?
Give that a try and let me know if it still doesn't work. I've tried to outline the setup steps in the readme but I'd welcome any improvement suggestions!
Ahhh you're right! it was the env schemas variable I didn't have set properly. To your question about databases, we have two redshift clusters a prod and a stage redshift cluster. all of our development happens in the stage redshift cluster where we keep separate databases for dev, and then a stg database that runs all of our code prior to releasing. We wanted dev to use the stg database and I have that working out ok.
I'm using upstream_prod_disabled_targets
to filter out our prod target, but do you have anything similar except for schemas? say I don't want to use upstream prod for anything in the seed
schema of the dev database, is that supported at all?
Ok, been digging more into this since I keep having the same issue. I ended up going through some of the dbt postgres adapter source code and I'm not sure doing cross db adapter.get_relation
is allowed. The list of tables/schemas is limited to only the current database if I'm reading this right https://github.com/dbt-labs/dbt-core/blob/main/plugins/postgres/dbt/include/postgres/macros/adapters.sql#L81-L101. So adapter.get_relation
won't be able to find the table in my upstream_prod_database
.
Hmm, I see what you mean. I see that get_relation calls list_relations, which uses list_relations_without_caching. However, it looks like the database
arg is passed through to here which should let the macro use the right database.
Can you see any info about this event in your logs? Try searching for "code": "E014"
and see if your table is listed.
I don't want to use upstream prod for anything in the seed schema of the dev database, is that supported at all?
Not at the moment but it's a good idea. Would you want to exclude based on the schema or project directory (or maybe both)?
I am not sure if it's related, but I am also having an issue with adapter.get_relation
. We have multiple production schemas, but each developer has a single developer schema.
Referencing the documentation here, I tried:
{%- set source_relation = adapter.get_relation(
database=this.database,
schema="dbt_jploshnick",
identifier=this.name) -%}
{{ log("Source Relation: " ~ source_relation, info=true) }}
This will return the table name. However, if I run:
{%- set source_relation = adapter.get_relation(
database=this.database,
schema="int_core",
identifier=this.name) -%}
{{ log("Source Relation: " ~ source_relation, info=true) }}
None
is returned. Any idea why that is happening? We have modified our generate_schema_name
macro and our generate_alias_name
, which I imagine is causing the issue.
Thanks so much for putting this package together! If I can get it to work for us, it'll be a huge help.
I guess what i can't figure out is, assuming I had changed nothing, how would the macro be able to know about the int_core
version of the model? The int_core
version was created in a cloud job, while the dbt_jploshnick
version is created in the cloud IDE. Do I need to import that cloud job's manifest for this package to work?
@JackPloshnick Thanks for adding this. Are you also using Redshift?
This is where we get the schema that would be used in prod when you're using custom schema names. However, I think I've spotted a bug that means only the prefix would be returned, e.g. int
. I'll look into fixing it and improving the tests.
Lastly, would you mind sharing your custom schema & alias macros? It's really helpful for me to see how other dbt projects are set up so the package works with as many as possible. I'm on the dbt Slack if you'd rather share them privately.
@tomasmv @JackPloshnick I've opened a PR that should fix the errors you're seeing with None
being returned. Could you please test it out? You can use the branch by adding this to packages.yml
:
packages:
- git: https://github.com/LewisDavies/upstream-prod.git
revision: fix-env-schema-target
generate_schema_name
macro - update instructions are in the README. The package will now print the fully-qualified name of the relation it's trying to find instead of returning None
, so it should be easier to understand any future issues.FYI @tomasmv I understand that cross-database queries may not work in Redshift unless you're using RA3 nodes. Thanks to @JackPloshnick for pointing that out.
I'll probably create a new release today or tomorrow. The instructions above will still work if you want to try it out now. Thanks for your help!
Hello!
Thank you for creating this project, it's exactly what my team and I were looking for! I'm trying to get this setup on our redshift cluster but I'm hitting an error. My
adapter.get_relation
call at the end keeps returningNone
. I've double checked the database, schema, and identifier inputs and everything looks good. It's just failing to find the relation using the function. Is there something extra I need to setup in my profiles or manifests that I'm missing? In my dbt project I've gotupstream_prod_database: stg
and we're using a customgenerate_schema_name
macro so I'm not specifying anything on the schema front.When I carve up the package and do something like
{% set prod_ref = parent_ref | replace(parent_ref.database, prod_database) %}
it works fine. So, I'm guessing there's something aboutget_relation
specifically I'm not understanding.