InsightSoftwareConsortium / ITK

Insight Toolkit (ITK) -- Official Repository. ITK builds on a proven, spatially-oriented architecture for processing, segmentation, and registration of scientific images in two, three, or more dimensions.
https://itk.org
Apache License 2.0
1.47k stars 686 forks source link

STYLE: Add historical bulk changes to git blame ignore #5162

Closed ebrahimebrahim closed 2 weeks ago

ebrahimebrahim commented 3 weeks ago

Close #2623

In case this needs to be re-done in the future, I used the following script in a python environment with GitPython installed:

import git
from pathlib import Path

def find_style_commits_with_large_changes(repo : git.Repo, prefix="STYLE", file_change_threshold=1000) -> list[str]:
    matching_commits = []
    for commit in repo.iter_commits():
        if commit.message.strip().startswith(prefix):
            num_files_changed = len(commit.stats.files)
            if num_files_changed >= file_change_threshold:
                matching_commits.append(commit.hexsha)

    return matching_commits

def get_commits_already_ignored(blame_ignore_revs_path) -> list[str]:
    return [
        line.strip()
        for line in blame_ignore_revs_path.read_text().split('\n')
        if line and not line.startswith('#')
    ]

if __name__ == "__main__":
    repo = git.Repo('.')
    style_commits = find_style_commits_with_large_changes(repo)
    blame_ignore_revs_path = Path("./.git-blame-ignore-revs")
    commits_already_ignored = get_commits_already_ignored(blame_ignore_revs_path)
    commits_to_add = [commit_hash for commit_hash in style_commits if commit_hash not in commits_already_ignored]
    entries_to_add : list[str] = []
    for commit_hash in commits_to_add:
        commit_message_title = repo.commit(commit_hash).message.split('\n')[0]
        entries_to_add.append(f"# {commit_message_title}\n{commit_hash}")
    text_to_add = '\n'.join(entries_to_add)
    with blame_ignore_revs_path.open('a') as file:
        file.write(text_to_add)

The file_change_threshold was 1001000, so the commits being added to the git-blame-ignore are those commits that both start with "STYLE" and that affect at least 1001000 files.


PR Checklist

Refer to the ITK Software Guide for further development details if necessary.

N-Dekker commented 3 weeks ago

Thanks, but I'm not sure about the whole .git-blame-ignore-revs feature. When you want to find out what particular commit changed the code, why would you want to always ignore commits that affect more than 100 files?

Would it not be possible for such a commit to introduce a significant change? I have seen large commits that include some manual changes, possibly introducing unexpected changes.

ebrahimebrahim commented 3 weeks ago

@N-Dekker It's not about the number of files really, it's more about the fact that they are STYLE changes

Would it not be possible for such a commit to introduce a significant change? I have seen large commits that include some manual changes, possibly introducing unexpected changes.

By definition if a change is marked as STYLE then the author is expressing that the change is not significant. These changes are largely not relevant to what one is looking for during a blame.

Also, this is all optional because the user can make git blame behave as they wish (--ignore-revs-file versus --no-ignore-revs), disabling in case one is actually interested in blaming through style changes

N-Dekker commented 3 weeks ago

It's not about the number of files really, it's more about the fact that they are STYLE changes

@ebrahimebrahim Suppose you want to find out when a particular bug is introduced. In general it might also be introduced by a STYLE commit, right? So you still want to see STYLE commits in a blame.

this is all optional because the user can make git blame behave as they wish (--ignore-revs-file versus --no-ignore-revs)

Very interesting, thanks. I think I want to use that option: --no-ignore-revs! Do you happen to know if it can be enabled for TortoiseGit and the GitHub website?

For example, when I look at https://github.com/InsightSoftwareConsortium/ITK/blame/master/Modules/Core/Common/include/itkObject.h and press [Blame], it says:

(i) Ignoring revisions in .git-blame-ignore-revs.

In the TortoiseGitBlame settings, I don't see how to switch off ignoring revisions either 🤷

ebrahimebrahim commented 3 weeks ago

Do you happen to know if it can be enabled for TortoiseGit and the GitHub website?

I'm not familiar with tortoisegit but based on this issue it looks like the --no-ignore-revs is the default behavior as they haven't added the "feature" yet -- idk if that info is up to date

As for github I don't see any way to turn it off :/

In general it might also be introduced by a STYLE commit, right? So you still want to see STYLE commits in a blame.

I guess it comes down to what we feel ought to be the default behavior. If we want the default to be to not ignore style commits then one approach may be to rename the .git-blame-ignore-revs file so that github cannot find it -- then users who want to ignore style commits can explicitly use --ignore-revs-file to point to the renamed file. My sense is that the default behavior ought to be to ignore style commits, since they are largely red herrings when seeking out what caused a certain change or broke something, but it's up for debate

thewtex commented 3 weeks ago

I think 100 is too small here -- 1000 files is closer to a "bulk change" in a project as large as ITK.

ebrahimebrahim commented 2 weeks ago

@N-Dekker Suppose I switch to Matt's suggested "bulk style changes" threshold of 1000 files instead of 100 files -- would that do for you?

Alternatively there is my earlier suggestion of renaming .git-blame-ignore-revs such that it is not automatically used by github, but again that comes down to the still unsettled question on what is the most reasonable default blame behavior.

N-Dekker commented 2 weeks ago

Suppose I switch to Matt's suggested "bulk style changes" threshold of 1000 files instead of 100 files -- would that do for you?

Thank you for asking, @ebrahimebrahim. As I'm not a fan of ignoring commits in the first place, I would certainly prefer a threshold of 1000 files, instead of 100. But of course, any threshold is arbitrary. If I would want to ignore commits at all, I would only want to ignore those that just do source code formatting adjustments (using clang-format, python black, etc).

Having said so, I very much appreciate your effort. 👍 I think I'll leave it up to the other ITK-ers how to proceed on this pull request.

dzenanz commented 2 weeks ago

@ebrahimebrahim Let's raise the limit to 1000, and manually add/exclude the few identified exceptions.