Selecting models using -m state:modified does not work as expected

tuanchris commented 3 years ago

Describe the bug

The flag to select modified model -m state:modified with the flag to specify the state directory --state does not seem to work as expected.

Steps To Reproduce

When I ran the following command with the manifest.json file in the current directory, dbt picked up 229/234 models.

dbt run --fail-fast --threads 1 --target prod -m state:modified+ --state .

112288640-20fd1b00-8cc0-11eb-88e1-75609ed8f0e7

When I selected the modified model only (without its downstream models) using the following command, dbt did not pick up any changes, which is the expected behavior

dbt run --fail-fast --threads 1 --target prod -m state:modified --state .

When I ran test, specifying only modified model using the following command, dbt picked up 30/160 tests

dbt test --fail-fast --threads 1 --target prod -m state:modified --state .

112289399-ddef7780-8cc0-11eb-9bdb-29aaa505c602

Expected behavior

I expected all three commands to not run anything, since there's no change whatsover.

Screenshots and log output

Screenshots added above.

System information

Which database are you using dbt with?

[ ] postgres
[ ] redshift
[X] bigquery
[ ] snowflake
[ ] other (specify: ____)

The output of dbt --version:

installed version: 0.19.0
   latest version: 0.19.0

Up to date!

Plugins:
  - bigquery: 0.19.0
  - snowflake: 0.19.0
  - redshift: 0.19.0
  - postgres: 0.19.0

The operating system you're using: macOS Big Sur 11.2.3 (20D91) The output of python --version: Python 3.7.4

Additional context

Add any other context about the problem here.

jtcohen6 commented 3 years ago

Hey @tuanchris, state:modified is a tricky, powerful feature. I'll need a few more details to figure out exactly what's going on in your project.

Given what you shared so far, here's my best inductive reasoning:

-m state:modified doesn't pick up any changes because no models are modified. Makes sense!
-m state:modified+ does pick up a lot of models, implying that a non-model is both "modified" and upstream of many models.
It is a current limitation of state comparison today that dbt detects a source, like the one below, to be modified when comparing across environments:

sources:
  - name: my_postgres_db
    database: "{{ 'raw' if target.name == 'prod' else 'raw_sampled' }}"

Proposed resolutions to this:

[#2744] Fix this behavior by comparing un-rendered versions of source properties, similar to how dbt now compares un-rendered forms of configs in v0.19.0 (#2713).
[#2704] Expose more granular state "subselectors" that would let you pick and choose between modification causes, and fine tune your selection criteria in complex, noisy projects.

If you have env/target-based behavior in your definitions for any sources, that would explain the behavior you're seeing. If that's not the case, let's keep looking for the root cause.

tuanchris commented 3 years ago

Hi @jtcohen6,

Thank you for getting back! Yes we do have sources with database dependent like follow:

sources:
  - name: my_postgres_db
    database: "{{ 'raw' if target.name == 'prod' else 'raw_sampled' }}"

I guess that due to this, dbt picked up non-modified models as modified. However in my case, the artifact was generated using the production environment and the CI code that I'm running is also targeting the prod environment, hence the --target prod flag in my commands.

My question is, is it the expected behavior, even when the state & the executing dbt is in the same environment? Also is there any work around for this at the moment or we would have to wait for either #2744 or #2704 to be implemented?

jtcohen6 commented 3 years ago

My question is, is it the expected behavior, even when the state & the executing dbt is in the same environment?

Ah good point—no, this is not the expected behavior! If the resolved Jinja value is the same, because the targets are the same, dbt should not be detecting a modification. It sounds like there may be something else afoot. Could you try running just:

dbt ls --target prod -s state:modified --state .

Using the same artifact, and report what's returned back? (The -s in place of -m is important, since it will include non-models as well.)

Also is there any work around for this at the moment or we would have to wait for either #2744 or #2704 to be implemented?

Hm, I don't believe there's a workaround for this at present. Depending on what we identify as the root cause, #2744 may be the right resolution.

tuanchris commented 3 years ago

Ah, I think I got the problem. I should have run dbt compile --target prod instead of dbt compile. All seems to work as expected now. We might want to note this down somewhere in the documentation.

One potential workaround that I can think of now is to generate both the prod and dev manifests when code successfully merged to the master branch. On push to the master branch, dbt will compare with the current prod manifest, while on PR to the master branch, dbt will compare with the current dev manifest.

dbt-labs / dbt-core