Bachmann1234 / diff_cover

Automatically find diff lines that need test coverage.
Apache License 2.0
713 stars 192 forks source link

diff everything on removal #367

Open cboden opened 1 year ago

cboden commented 1 year ago

In our CI/CD we're running the command diff-quality --violations sqlfluff --compare-branch=master --fail-under=100 on version 7.7.0 on a branch that has only deleted/removed files. We have hundreds of files. Normally, if any file is modified or added, diff-quality only checks those files and takes several seconds. When the branch only has removals it's scanning everything and taking 40 minutes.

Bachmann1234 commented 1 year ago

interesting, ill have to find some time to see if I can reproduce this and see whats up

cboden commented 1 year ago

An update: This issue is non-deterministic.

Scenario: We have a file that would fail a lint/violation check, but it shouldn't be scanned in our given scenario. I've confirmed via git diff --name-only master the file failing is not present, only the removed files are. My expectation is diff-quality would pass 0 files to the violations tool. (as I write this, I realize I'm making some assumptions on how the interaction between diff_cover and the sqlfluff plugin might work)

When attempting to replicate locally, my first attempt running diff-quality passed. I re-traced my steps to ensure I did the same thing as CI/CD and got it to fail on the file that wasn't modified. I then re-ran the exact same command listed in my first comment 5 times in succession. The result was 4 failures and 1 pass.

cboden commented 1 year ago

Upgraded to 8.0 with same result. Using Python 3.9.9.

Bachmann1234 commented 1 year ago

So its not immediately obvious to me why diff-quality would be acting differently on removed files.

the basic algorithm is

run the quality tool command over the project

Run 'git -c diff.mnemonicprefix=no -c diff.noprefix=no diff --no-color --no-ext-diff -U0 origin/main...HEAD'

compare the lines reported the quality tool to the lines in the diff.

Now, I do think sql fluff is a bit different in that it runs over each and every file. https://github.com/sqlfluff/sqlfluff/blob/9579127595333037aa893adb4384d7612c32e0ed/src/sqlfluff/diff_quality_plugin.py#L63

@barrywhart do you have any insight on anything on this issue?

@cboden do you have an example repo we can use to debug this at all? Also the branch with the deleted files. Can you share if its a ton of deleted files? Or just s small number?

barrywhart commented 1 year ago

No immediate insights, but I see that the SQLFluff plugin logs the SQLFluff command before running it. It may be helpful to look at that.

One random thought (no evidence, just brainstorming) -- is it possible it's running SQLFluff against one or more directory paths, hence processing all the files in that directory?

barrywhart commented 1 year ago

Two more thoughts:

cboden commented 1 year ago

do you have an example repo we can use to debug this at all? Also the branch with the deleted files. Can you share if its a ton of deleted files? Or just s small number?

It's a small number and is happening on every PR we open like it. The latest PR had the following changes to a dbt repository:

barrywhart commented 1 year ago

My guess is that this is not a bug; that the repo is corrupted somehow.

cboden commented 1 year ago

My guess is that this is not a bug; that the repo is corrupted somehow.

How can we test/confirm this? How would a corruption apply to diff_cover or sqlfluff?

barrywhart commented 1 year ago

Run the same git command that diff-cover uses and see if it includes more files than it should.

cboden commented 1 year ago

I've confirmed via git diff --name-only master the file failing is not present, only the removed files are.

Is there another git command I should test?