Bachmann1234 / diff_cover

Automatically find diff lines that need test coverage.
Apache License 2.0
710 stars 191 forks source link

diff everything on removal #367

Open cboden opened 1 year ago

cboden commented 1 year ago

In our CI/CD we're running the command diff-quality --violations sqlfluff --compare-branch=master --fail-under=100 on version 7.7.0 on a branch that has only deleted/removed files. We have hundreds of files. Normally, if any file is modified or added, diff-quality only checks those files and takes several seconds. When the branch only has removals it's scanning everything and taking 40 minutes.

Bachmann1234 commented 1 year ago

interesting, ill have to find some time to see if I can reproduce this and see whats up

cboden commented 1 year ago

An update: This issue is non-deterministic.

Scenario: We have a file that would fail a lint/violation check, but it shouldn't be scanned in our given scenario. I've confirmed via git diff --name-only master the file failing is not present, only the removed files are. My expectation is diff-quality would pass 0 files to the violations tool. (as I write this, I realize I'm making some assumptions on how the interaction between diff_cover and the sqlfluff plugin might work)

When attempting to replicate locally, my first attempt running diff-quality passed. I re-traced my steps to ensure I did the same thing as CI/CD and got it to fail on the file that wasn't modified. I then re-ran the exact same command listed in my first comment 5 times in succession. The result was 4 failures and 1 pass.

cboden commented 1 year ago

Upgraded to 8.0 with same result. Using Python 3.9.9.

Bachmann1234 commented 1 year ago

So its not immediately obvious to me why diff-quality would be acting differently on removed files.

the basic algorithm is

run the quality tool command over the project

Run 'git -c diff.mnemonicprefix=no -c diff.noprefix=no diff --no-color --no-ext-diff -U0 origin/main...HEAD'

compare the lines reported the quality tool to the lines in the diff.

Now, I do think sql fluff is a bit different in that it runs over each and every file. https://github.com/sqlfluff/sqlfluff/blob/9579127595333037aa893adb4384d7612c32e0ed/src/sqlfluff/diff_quality_plugin.py#L63

@barrywhart do you have any insight on anything on this issue?

@cboden do you have an example repo we can use to debug this at all? Also the branch with the deleted files. Can you share if its a ton of deleted files? Or just s small number?

barrywhart commented 1 year ago

No immediate insights, but I see that the SQLFluff plugin logs the SQLFluff command before running it. It may be helpful to look at that.

One random thought (no evidence, just brainstorming) -- is it possible it's running SQLFluff against one or more directory paths, hence processing all the files in that directory?

barrywhart commented 1 year ago

Two more thoughts:

cboden commented 11 months ago

do you have an example repo we can use to debug this at all? Also the branch with the deleted files. Can you share if its a ton of deleted files? Or just s small number?

It's a small number and is happening on every PR we open like it. The latest PR had the following changes to a dbt repository:

barrywhart commented 11 months ago

My guess is that this is not a bug; that the repo is corrupted somehow.

cboden commented 11 months ago

My guess is that this is not a bug; that the repo is corrupted somehow.

How can we test/confirm this? How would a corruption apply to diff_cover or sqlfluff?

barrywhart commented 11 months ago

Run the same git command that diff-cover uses and see if it includes more files than it should.

cboden commented 11 months ago

I've confirmed via git diff --name-only master the file failing is not present, only the removed files are.

Is there another git command I should test?