Kotlin / binary-compatibility-validator

Public API management tool
Apache License 2.0
800 stars 59 forks source link

Remove case-insensitive dump file names handling logic #231

Closed fzhinkin closed 3 months ago

fzhinkin commented 4 months ago

https://github.com/Kotlin/binary-compatibility-validator/pull/79 and https://github.com/Kotlin/binary-compatibility-validator/pull/84 addressed the issue reported in https://github.com/Kotlin/binary-compatibility-validator/issues/76 by looking up .api files in a case-insensitive manner.

To make things work smoothly on the Gradle side, we need to:

The latter option should work, but it requires specifying input/output directories and a file name for all the tasks. Otherwise, Gradle won't be able to track dependencies between tasks correctly. Also, all tasks consuming .api files from the repo should be updated to support case-insensitive file names (and for Klibs, the majority of tasks need it).

Note that before all the klib-related changes, the check task was using directories as its inputs, while the dump task was using files as its output, so the only way to enforce dependency between these tasks was dependsOn call.

Looking at #76 I'm not sure if macOS's case insensitivity was the root cause. The :apiDump task was based on Sync, which wipes out the destination directory before copying files. Even if the dump had a name with the wrong characters case, rerunning apiDump should have fixed it.

So, I propose removing all the logic for case-insensitive file names from the BCV and using exact file paths as Gradle task inputs/outputs.

fzhinkin commented 4 months ago

@ZacSweers, @joffrey-bion, @qwwdfsad PTAL. You were all involved in the original issue (#76) discussion, so I'd love to hear your thoughts on the topic.

ZacSweers commented 4 months ago

Definitely in favor of case-insensitivity 👍

fzhinkin commented 4 months ago

@ZacSweers do you have a project that relies on it?

ZacSweers commented 4 months ago

I don't have any projects that desire case sensitivity. The last issue was that it unexpectedly failed on CI due to CI running linux

fzhinkin commented 4 months ago

Looking at #76 once again. The problem is how git works on case-insensitive file systems:

joffrey-bion commented 4 months ago

Wouldn't it "just work(TM)" to make the BCV delete the old file in a case-insensitive manner, and recreate it with an exact match of the project name every time?

EDIT: ah sorry that's the case where git would just update the wrongly-cased file in the index instead of renaming it.

fzhinkin commented 4 months ago

BCV was always deleting the old file, the problem is how git tracks deletion and creation of a new file:

$ echo "yo" > AAA.txt; git add AAA.txt; git commit -a -m 'added AAA.txt' $ rm AAA.txt; echo YO > aaa.txt $ git add . $ git status On branch main Changes to be committed: (use "git restore --staged ..." to unstage) modified: AAA.txt

$ ls aaa.txt


- with `core.ignorecase=false`

$ git init $ echo "yo" > AAA.txt; git add AAA.txt; git commit -a -m 'added AAA.txt'

please take care of case-insentiveness

$ git config core.ignorecase false $ rm AAA.txt; echo YO > aaa.txt $ git add . $ git status On branch main Changes to be committed: (use "git restore --staged ..." to unstage) modified: AAA.txt new file: aaa.txt

$ echo '🤡'


The only proper way to handle file name case change is `git mv`:

$ git mv AAA.txt aaa.txt $ git status On branch main Changes to be committed: (use "git restore --staged ..." to unstage) deleted: AAA.txt new file: aaa.txt

joffrey-bion commented 4 months ago

Yes, understood. Sorry I was too slow to update my comment apparently.

I can't believe git works this way by default. This is quite enraging. Now, is it the responsibility of BCV to deal with it? Maybe, maybe not. If we can make the experience better, why not.

So, based on what you said, even being strict from the BCV side during apiDump doesn't guarantee anything, because people could change the case of their project name between 2 generations, and git will happily save the wrong case.

I guess that leaves us no choice but to be case insensitive on the apiCheck side.

fzhinkin commented 4 months ago

In general, it's nice to provide a better user experience.

However, those who use core.ignorecase false continue having problems (althougt it should be easier to spot the problem). Also, as Gradle tasks have to use directories as their inputs instead of precise file locations, and also because we store both JVM and Klibs dump in the same folder, every update of one of these dumps will invalidate all dependent tasks, which is not very nice.

joffrey-bion commented 4 months ago

if core.ignorecase (introduced in git v1.5.6) is true (that would be the default)

with default git repo settings

For the record, this is actually not quite correct. Git's default depends on the case-sensitivity of the local OS. So in case-sensitive systems, core.ignorecase will be false by default, which is best.

Reference: https://git-scm.com/docs/git-config#Documentation/git-config.txt-coreignoreCase

In general, it's nice to provide a better user experience

Agreed. Although I would argue that if we get annoying Gradle behavior with task invalidation, it's worth reconsidering what experience is best.

Basically, if apiDump behaves correctly, and people are just committing wrong cases due to git's behavior on their machine, there is little BCV could do about it. The repository's state will be "incorrect" in the sense that the case of the .api file will not match the case of the project. We could indeed decide to be case-insensitive for apiCheck, but it still doesn't solve the broken state of the repo.

joffrey-bion commented 4 months ago

However, those who use core.ignorecase false continue having problems (althougt it should be easier to spot the problem).

Could you please expand on that? If people use core.ignorecase=false, the apiDump task will always generate correct files, and git will commit them in the correct case, right?

fzhinkin commented 4 months ago

For the record, this is actually not quite correct. Git's default depends on the case-sensitivity of the local OS. So in case-sensitive systems, core.ignorecase will be false by default, which is best.

Well, yeah, that's true. I meant how it works on macOs & Windows.

Could you please expand on that? If people use core.ignorecase=false, the apiDump task will always generate correct files, and git will commit them in the correct case, right?

Not quite. Without git mv, git can't detect old file removal but sees only new file creation. So, a user may end up with the two files in repo:

$ git init;
$ git config core.ignorecase false
$ echo 123 > zzz
$ git add zzz; git commit -a -m 'initial'
$ mv zzz ZZZ
$ git add ZZZ; git commit -a -m 'rename'
$ git ls-files
ZZZ
zzz
joffrey-bion commented 4 months ago

Not quite. Without git mv, git can't detect old file removal but sees only new file creation. So, a user may end up with the two files in repo

I don't quite get that point. I mean, this would happen in all systems, and with all renames, regardless of whether it's just a case change or not. It's just user error with git commands (just doing git add ZZZ isn't sufficient to register the whole rename, you would need a git add zzz too, or to use git mv in the first place, or just commit this from your IDE which should deal with it properly).

Also, if the user renames their project (which is the case that leads to this kind of stuff), won't they have to deal with this manually for the directory itself and other files named after the project anyway?

fzhinkin commented 4 months ago

I don't quite get that point. I mean, this would happen in all systems, and with all renames, regardless of whether it's just a case change or not. It's just user error with git commands (just doing git add ZZZ isn't sufficient to register the whole rename, you would need a git add zzz too, or to use git mv in the first place, or just commit this from your IDE which should deal with it properly).

Assuming that a user uses an IDE to work with a project, if a file rename is performed using GUI, everything will go through underlying VCS's commands, and the rename will be handled correctly.

But it's not the case for BCV, where a file is renamed during Gradle's command execution.

As a user who renamed a project and rerun apiDump, I'll see the following in IDEA's CSV commit view:

Also, if the user renames their project (which is the case that leads to this kind of stuff), won't they have to deal with this manually for the directory itself and other files named after the project anyway?

That poses a question: why should BCV handle scenarios when someone forgets to perform some related actions after renaming a project?

joffrey-bion commented 4 months ago

But it's not the case for BCV, where a file is renamed during Gradle's command execution.

Indeed, and I'm talking about the commit action, not the rename action. For sure, git mv is not an option here because the user doesn't control the rename action (it's done by BCV), but the commit action is definitely something done by the user, via the IDE or the command line. My point was that, if the user does git add only on the new name (and not the old), it's a user error.

As a user who renamed a project and rerun apiDump, I'll see the following in IDEA's CSV commit view if ignorecase is false, there will be no changes in an old file. So nothing would hint me to remove an old file from index.

If this were the case in the IDE, I believe that should be reported as an IDE bug. But my experiment doesn't show this (at least on my Windows machine). Here is my result:

PS C:\Projects\test-git> git init
Initialized empty Git repository in C:/Projects/test-git/.git/
PS C:\Projects\test-git> git config user.name "Test User"
PS C:\Projects\test-git> git config user.email "test-user@noreply.com"
PS C:\Projects\test-git> git config core.ignorecase false
PS C:\Projects\test-git> echo 123 > zzz
PS C:\Projects\test-git> git add zzz
PS C:\Projects\test-git> git commit -m "initial zzz"
[main (root-commit) 5c35d17] initial zzz
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 zzz
PS C:\Projects\test-git> mv zzz ZZZ
PS C:\Projects\test-git> git status
On branch main
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        deleted:    zzz

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        ZZZ

no changes added to commit (use "git add" and/or "git commit -a")

image

We see both the deleted and the created file, so we're supposed to add both of those and commit both as well. And this works fine. It is even considered a proper rename by git if done in a single commit:

PS C:\Projects\test-git> git add zzz
PS C:\Projects\test-git> git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        deleted:    zzz

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        ZZZ

PS C:\Projects\test-git> git add ZZZ
PS C:\Projects\test-git> git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        renamed:    zzz -> ZZZ

if ignorecase is true, there will be no changes in .api files;

I was mainly talking about the ignorecase=false case, but yeah in the default case where ignorecase=true, it's a bit annoying, but that's not really a concern of BCV. It's just how git works, and if people can't commit the file, it makes sense to investigate on the git side of things.

Also, if the user renames their project (which is the case that leads to this kind of stuff), won't they have to deal with this manually for the directory itself and other files named after the project anyway?

That poses a question: why should BCV handle scenarios when someone forgets to perform some related actions after renaming a project?

Exactly. To be frank, BCV is not even supposed to know git exists, let alone the fact that git has any feature related to the filename case. If people don't manage to commit the case change, they should probably open issues with git instead.

fzhinkin commented 4 months ago

I'm leaning toward API dump filename case mismatch no longer being tolerable on case-sensitive filesystems.

The decision could always be reconsidered, but if the issue becomes annoying, I'd rather detach dump file names from project names (#233).