Open cboden opened 1 year ago
interesting, ill have to find some time to see if I can reproduce this and see whats up
An update: This issue is non-deterministic.
Scenario: We have a file that would fail a lint/violation check, but it shouldn't be scanned in our given scenario. I've confirmed via git diff --name-only master
the file failing is not present, only the removed files are. My expectation is diff-quality
would pass 0 files to the violations tool. (as I write this, I realize I'm making some assumptions on how the interaction between diff_cover and the sqlfluff plugin might work)
When attempting to replicate locally, my first attempt running diff-quality passed. I re-traced my steps to ensure I did the same thing as CI/CD and got it to fail on the file that wasn't modified. I then re-ran the exact same command listed in my first comment 5 times in succession. The result was 4 failures and 1 pass.
Upgraded to 8.0 with same result. Using Python 3.9.9.
So its not immediately obvious to me why diff-quality would be acting differently on removed files.
the basic algorithm is
run the quality tool command over the project
Run
'git -c diff.mnemonicprefix=no -c diff.noprefix=no diff --no-color --no-ext-diff -U0 origin/main...HEAD'
compare the lines reported the quality tool to the lines in the diff.
Now, I do think sql fluff is a bit different in that it runs over each and every file. https://github.com/sqlfluff/sqlfluff/blob/9579127595333037aa893adb4384d7612c32e0ed/src/sqlfluff/diff_quality_plugin.py#L63
@barrywhart do you have any insight on anything on this issue?
@cboden do you have an example repo we can use to debug this at all? Also the branch with the deleted files. Can you share if its a ton of deleted files? Or just s small number?
No immediate insights, but I see that the SQLFluff plugin logs the SQLFluff command before running it. It may be helpful to look at that.
One random thought (no evidence, just brainstorming) -- is it possible it's running SQLFluff against one or more directory paths, hence processing all the files in that directory?
Two more thoughts:
diff-quality
command (not using SQLFluff plugin) has the same issue. As they say, "absence of evidence is not evidence of absence." From this code, it's clear that both branches (batch or non-batch handling) process src_paths_changed
.do you have an example repo we can use to debug this at all? Also the branch with the deleted files. Can you share if its a ton of deleted files? Or just s small number?
It's a small number and is happening on every PR we open like it. The latest PR had the following changes to a dbt repository:
My guess is that this is not a bug; that the repo is corrupted somehow.
My guess is that this is not a bug; that the repo is corrupted somehow.
How can we test/confirm this? How would a corruption apply to diff_cover or sqlfluff?
Run the same git command that diff-cover uses and see if it includes more files than it should.
I've confirmed via git diff --name-only master the file failing is not present, only the removed files are.
Is there another git command I should test?
In our CI/CD we're running the command
diff-quality --violations sqlfluff --compare-branch=master --fail-under=100
on version 7.7.0 on a branch that has only deleted/removed files. We have hundreds of files. Normally, if any file is modified or added, diff-quality only checks those files and takes several seconds. When the branch only has removals it's scanning everything and taking 40 minutes.