Bachmann1234 / diff_cover

Automatically find diff lines that need test coverage.
Apache License 2.0
702 stars 190 forks source link

Sub-folders and relative paths #179

Open fernandobrito opened 3 years ago

fernandobrito commented 3 years ago

Hello! This is my first attempt to use diff_cover, so please let me know if I'm missing something obvious 😄 .

To give more context, I want to use diff_cover together with the sqlfluff linter. They register themselves as a plugin and it seems to work very well on usual setups: https://docs.sqlfluff.com/en/stable/production.html#adding-diff-quality-to-your-builds. Amazing to see this kind of integration between different open source projects! Great feature being provided by diff_cover, thanks!

In my case, however, I use a "monorepo" repository architecture and I have my project that sqlfluff should run against on a subfolder. Something like:

git root: /Users/Fernando/my_repository/ project where I want to run diff-quality: /Users/Fernando/my_repository/project_1/

If I run diff-quality from inside project_1/, I will get file paths back from git diff relative to my_repository/ (git repository root), for example: project_1/file_1.sql, but I'm already inside the project_1/ folder. Passing such relative paths to the linting tools won't work. For this particular case, with sqlfluff, I also can't run it from outside project_1 due to a limitation on their side, but I think diff_cover should be generic enough (see some alternatives below) to handle it being called from sub-folders.

Alternative 1: absolute paths

My problem would be solved if when diff-quality passes file paths to the linting tools, it would pass absolute paths. sqlfluff would be able to find the target files correctly in this case. Maybe have an extra argument on diff-quality for this?

Alternative 2: relative path to where diff-quality is being called

Another possible solution would be if diff-quality would send file paths relative to where it is being called, and not to the repository root folder. It turns out that git diff accepts a --relative argument which does exactly that.

One very convenient benefit of this approach is that using --relative will also limit the scope of git diff to only show modified files in the folder where it is being run. My alternative 1 below would pass modified files to the linting tools even if they don't belong to project_1, which is not what I want when I call it from inside project_1.

Some options of implementing alternative 2:

Does it make sense? Is there another way to solve my issue?

fernandobrito commented 3 years ago

Ahaha, as a side note, I just spent like 30 minutes trying to figure out how to run (on my fork):

diff-quality --violations sqlfluff --diff-options "--relative"

Python argparse doesn't like this syntax. It will try to parse my last argument as a command and return:

usage: diff-quality [-h] --violations TOOL [--html-report FILENAME]
                    [--external-css-file FILENAME] [--compare-branch BRANCH]
                    [--options [OPTIONS]] [--diff-options [DIFF_OPTIONS]]
                    [--fail-under SCORE] [--ignore-staged] [--ignore-unstaged]
                    [--exclude EXCLUDE [EXCLUDE ...]]
                    [--diff-range-notation RANGE_NOTATION] [--version]
                    [--ignore-whitespace]
                    [input_reports [input_reports ...]]
diff-quality: error: unrecognized arguments: --relative

The solution is to run as:

diff-quality --violations sqlfluff --diff-options="--relative"

As found in a comment on this question: https://stackoverflow.com/questions/45541581/using-argparse-with-argument-values-that-begin-with-a-dash

Bachmann1234 commented 3 years ago

@fernandobrito if I understand this fully (and ill admit I can be a but slow sometimes :-P) just passing the absolute paths to quality tools will fix it (solution 1)

I don't believe we need an extra arg here because absolute paths and relative paths should be backwards compatible.

can you do me a favor and experiment with this branch and let me know if it helps you out?

https://github.com/Bachmann1234/diff_cover/pull/180

fernandobrito commented 3 years ago

Hi, @Bachmann1234, thanks a lot for your quick response and proposed solution!

I'm just testing your branch now and I realize two things: 1) The change you did doesn't get executed in my particular case because the tool that I use (sqlfluff) registers itself as a plugin inheriting from BaseViolationReporter, and your changes are in QualityReporter. Here is the code for their plugin: https://github.com/sqlfluff/sqlfluff/blob/master/src/sqlfluff/diff_quality_plugin.py. I guess that to make it work for all plugins, the path should be modified before calling BaseViolationReporter implementations.

2) But anyway, I could just take your changes and apply on their plugin locally to test it, and here are the results:

Assume my git root is in: /Users/Fernando/my_repository/ I want/have to run diff-quality from: /Users/Fernando/my_repository/project_1/

As a side note, if I run git diff from /Users/Fernando/my_repository/project_1/, I get the path from my changes relative to the git root:

So, before your changes, src_path is for example project_1/file_1.sql, which is bad, because I'm already inside project_1/, so the tool will look for project_1/project_1/file_1.sql.

With your changes, src_path will be /Users/Fernando/my_repository/project_1/project_1/file_1.sql, which is also incorrect. From what I understand, os.path.abspath(path) will just take the current directory path (/Users/Fernando/my_repository/project_1/) and join with the argument (project_1/file_1.sql), so that doesn't help. The correct way of getting the absolute path for the file would be to join src_path with the git root folder, and not to the current directory. Do you have already an easy way to get the git root folder?

As a side note, yesterday I also experimented with my other approach (--diff-options argument), which worked for me. My changes (without tests yet) can be seen here: https://github.com/Bachmann1234/diff_cover/compare/master...fernandobrito:add-diff-options-argument. And then I can call:

diff-quality --violations=sqlfluff --diff-options="--relative"

If we can make it work without any extra arguments, I don't mind either 😄 . But for my case, if we just convert relative paths to absolute paths, I would also suggest a --include argument to select which files should be passed to the tool (unless there's a way of using the existing --exclude for this in a way that I don't understand).

This is because If I'm calling diff-quality from /Users/Fernando/my_repository/project_1/, I don't want to pass changed files from /Users/Fernando/my_repository/project_2/ to the linting tool. The nice thing about my solution above is that I get this behavior for free (only listing changes in my current folder) when using git diff --relative.

I always end up writing too much, so let me know if it doesn't make sense and I can try to be more concise 😅

Bachmann1234 commented 3 years ago

I think if abspath worked like you said all my tests would have exploded.

I could be wrong here but I don't think abspath joins at all so much as converts the relative path to an absolute one by tracing backwards to the root.

Bachmann1234 commented 3 years ago

Ultimately I'm not completely against adding the arg (assuming tests and maybe a bit to the readme gets added)

Just if we could get this working without adding more arguments that seems cleaner as the diff parsing can be a bit fragile and I worry that a lot of attempts to do anything non trivial with this arg will have problems.

fernandobrito commented 3 years ago

I agree that it will be much simpler if there's no new argument (and users can also do a --include like I mentioned or if --exclude can support negations), but I think we need to get more creative with the paths.

abspath documentation states:

[...]. On most platforms, this is equivalent to calling the function normpath() as follows: normpath(join(os.getcwd(), path)).

https://docs.python.org/3/library/os.path.html#os.path.abspath

Also testing locally:

$ pwd                                                                                                                                                ✔  02:18 
> /tmp/git_root/project_1

$ python
>>> import os
>>> os.path.abspath('project_1/file_1.sql')
'/tmp/git_root/project_1/project_1/file_1.sql'

(note the double project_1)

If we could make git diff always return absolute path, then that could be added as a default internal argument on GitDiff. Or there are git commands such as git rev-parse --show-toplevel that will return the path to the git root, and then that could be joined with the existing path that we already get from git diff.

Bachmann1234 commented 3 years ago

🤔 gotcha.

Yeah, I think I like using rev-parse to get the git root. I dont ... think .... you can get the absolute path from git diff.

but ive been wrong once in this convo already!

fernandobrito commented 3 years ago

I can try a PR if you are busy.

Update: ah wait, I was looking for a good place to put such call to rev-parse but it's already there in the codebase 😮 . https://github.com/Bachmann1234/diff_cover/blob/master/diff_cover/git_path.py#L66. I guess I just have to use it before we pass the path to the tool.

fernandobrito commented 2 years ago

Hi @Bachmann1234, sorry for the absence, but now I'm back 👋 .

For our use case in my company (running diff-quality on a subfolder on a mono-repository) we need both (1) to have absolute paths provided to the linting tool but also (2) that only the modified files within that subfolder are passed to the linting tool.

I just saw that one month after my last comment in this issue, @kasium contributed with a --include feature (thanks! 🙌 ). Later this month I will try to implement the absolute path missing part and give an update here.

fernandobrito commented 2 years ago

I gave it a quick try this weekend and it looks more complicated than I thought 😅.

My goal was to have either the absolute path or the relative path (relative to the current folder, and not relative to the git root) passed to the drivers, but then I realized that I might need to convert the paths back to being git root relative after parsing the command results and before passing it back to the report generator.

I don't think I will really have time to go deeper into this, and since no one else upvoted this issue or commented on it, maybe my needs are too specific and I should just close the issue?

Something I did notice is that when calling diff-quality from subfolders, the results are a bit misleading in my opinion since files are not processed but reported as 100%.

Example:

  1. Clone this repository
  2. Go to a file and add a violation. Example: add violation = f"" in diff_cover/violationsreporters/base.py
  3. If I run diff-quality --violations=flake8, I get 1 violation, as expected
  4. If I cd diff_cover and run the same, I get diff_cover/violationsreporters/base.py (100%).

The file was not processed/passed to flake8 because this line (https://github.com/Bachmann1234/diff_cover/blob/main/diff_cover/violationsreporters/base.py#L151) is using the git root relative path (diff_cover/violationsreporters/base.py in this case) and checking if it exists based on the current directory, but since I'm not in the git root, but in a subfolder, this if fails, but on the final report the file is shown as 100%. Should it maybe log a warning or somehow give a sign to the user that this file was never processed? Or maybe it's already obvious enough that we should always run diff-quality from the git root?

kasium commented 2 years ago

@fernandobrito maybe the work in https://github.com/Bachmann1234/diff_cover/commit/dfd593605a8990d6f930bac1a65f24b42d7d3eca can also help?