Open Kryt87 opened 1 day ago
Hi @Kryt87 thanks for opening this issue, but sorry to hear you're running into this problem.
Quick question before diving into what could be happening here. What is the name of your source database you would expect the package to be reading from as opposed to the target? In your reddit_ads_database
variable I see you have defined ingestion
, and then I can see in the log that ingestion
was used for the query where no campaign_report table was found.
While listing relations in database=ingestion, schema=fivetran_reddit_ads
Is the ingestion
database the source or is there a different database you would like the package to read from?
Hi Joe, thanks for jumping on this so fast. The source database is called
ingestion
, where Fivetran is writing to and we want to read from. Our
target database, where we want the models to reside is called prod
.
Thanks for clarifying that! It does seem like from the logs that the package is reading from the ingestion
database 🤔
Would you be able to confirm that in this case you do see the campaign_report
table in the ingestion.fivetran_reddit_ads
location in your Redshift destination?
I can confirm that the campaign_report
table exists in ingestion.fivetran_reddit_ads
.
However, when I ran a few tests I was able to confirm that the packages was not reading from the ingestion database. In a dev target database I created the dummy table dbt_kurt_drew.fivetran_reddit_ads.test_reddit_ads
. I then ran it again, where the debugging logs showed:
[0m03:13:30.546025 [debug] [Thread-1 (]: While listing relations in database=ingestion, schema=fivetran_reddit_ads, found: test_reddit_ads
[0m03:13:30.549303 [warn ] [Thread-1 (]:
Please be aware: The CAMPAIGN_REPORT table was not found in your REDDIT_ADS schema(s).
This dummy table does not exist in the ingestion
database.
Thanks for sharing. One last question before diving deeper - do you have any other configs (outside of what you already shared in the issue description) in your dbt_project.yml
vars or models sections that could be causing the package to read from the incorrect location?
Nothing that stands out to me 😞
Thanks for checking @Kryt87, I was able to dig into this deeper and believe I may have found the culprit here.
I was able to recreate this error on Redshift. I did notice that this worked as expected on all other destinations which was a bit odd to me. However, for Redshift I saw the exact same behavior as you mentioned. I then went under the hood and was inspecting these lines which are run during the first query step of the Ad Reporting models and found that the proper source database.schema.table
was being used. However, the relation.value
would always be None
and therefore result in the empty table warning you're seeing.
So it seems the package is interpreting the variables properly, but the actual relation.value is not working as expected. I then decided to hardcode a select * from database.schema.value
in one of my models and I immediately saw the below error.
Database Error in model stg_reddit_ads__ad_tmp (models/tmp/stg_reddit_ads__ad_tmp.sql)
Datasharing is not enabled on this cluster
This would make sense why the variables are working as expected, but then the actual relation is failing since datasharing isn't enabled on my end. Do you by chance have datasharing enabled on your end to facilitate this? I imagine if you're able to enable this then you'll be able see success when reading from one database and write to another.
Let me know if this helps address your issue!
I'm glad you are seeing the same error, I find Redshift can be very particular about this kind of thing. Regarding data sharing, it is enabled for the cluster. However, there are no data shares set up between these two databases and there shouldn't need to be as they are on the same cluster. Data sharing is primarily designed to facilitate sharing data across different Redshift clusters, which can be in different AWS accounts or regions.
When testing, I created both a staging model (view) that sourced the campaign_report
and referenced it in an intermediate model. This all worked as expected, even though the source was on the ingestion
database and the staging and intermediate models were created on a seperate dev databases.
I also was confused why the data sharing was not enabled for the same cluster, but that's the error I was seeing.
It's also strange that you're able to query directly from the source in that staging model yet you don't see the error I saw. Could you share your test staging model view and the log result? I'm wondering if something may be causing issues at the adapter.get_relations
level within dbt-redshift itself 🤔
I think you are on the right track with dbt-redshift.
The view is rather simple:
with source as (
select * from {{ source('fivetran_reddit_ads', 'campaign_report') }}
),
renamed as (
select * from source
)
select * from renamed
With the source being:
sources:
- name: fivetran_reddit_ads
description: ""
database: ingestion
tables:
- name: campaign_report
description: ""
debug.log - slimmed down debugging logs
Is there an existing issue for this?
Describe the issue
Selecting the source database using relevant
<connector>_database
(reddit_ads_database
etc.) variables in thedbt_project.yml
file is not working as expected when the target database is not the same as the database where the data is stored. In this instance many "Please be aware" messages about being unable to find source tables in the database.After reviewing the debug level logs and running a few tests, I observed that when running the staging models, a query against the Redshift
information_schema.tables
table is run but is pointing to theinformation_schema
schema on the target database and not the data source database. Which is why the tables cannot be found.Relevant error log or model output
CLI Output:
Debugging Logs:
Expected behavior
Expected that the required source tables could be identified in the
ingestion
database. There is no issue if the target database is the same as the target database.Possible solution
The
redshift__list_relations_without_caching
appears to be what is used to list the tables in a schema. Not sure what needs to happen to get it pointing at the correct database.dbt Project configurations
Package versions
What database are you using dbt with?
redshift
How are you running this dbt package?
dbt Core™
dbt Version
Core:
Plugins:
Additional Context
No response
Are you willing to open a PR to help address this issue?