Kentzo / git-archive-all

A python script wrapper for git-archive that archives a git superproject and its submodules, if it has any. Takes into account .gitattributes
MIT License
372 stars 81 forks source link

Patterns in .gitattributes do not work if they end with a slash #87

Closed doxygen-spammer closed 3 years ago

doxygen-spammer commented 3 years ago

The patterns in .gitattributes are allowed to end with a slash, in which case they match only directories, not files.

Apparently, this is not supported correctly by git-archive-all. Running this in a shell:

mkdir test
cd test
git init
mkdir include dont-include-1 dont-include-2
touch include/a.txt dont-include-1/b.txt dont-include-2/c.txt
echo dont-include-1/ export-ignore >>.gitattributes
echo dont-include-2 export-ignore >>.gitattributes
git add -A
git commit -m "Test"
git-archive-all ../test.zip
zipinfo ../test.zip

results in:

Archive:  ../test.zip
Zip file size: 415 bytes, number of entries: 3
-rw-rw-r--  2.0 unx       59 b- defN 21-Sep-22 17:12 test/.gitattributes
-rw-rw-r--  2.0 unx        0 b- defN 21-Sep-22 17:12 test/dont-include-1/b.txt
-rw-rw-r--  2.0 unx        0 b- defN 21-Sep-22 17:12 test/include/a.txt
3 files, 59 bytes uncompressed, 41 bytes compressed:  30.5%

(on Ubuntu 20.04.)

.gitattributes is now:

dont-include-1/ export-ignore
dont-include-2 export-ignore

Compare against git:archive:

git archive --output ../test2.zip HEAD
zipinfo ../test2.zip

results in:

Archive:  ../test2.zip
Zip file size: 451 bytes, number of entries: 3
-rw----     0.0 fat       59 tx defN 21-Sep-22 17:12 .gitattributes
drwx---     0.0 fat        0 bx stor 21-Sep-22 17:12 include/
-rw----     0.0 fat        0 tx stor 21-Sep-22 17:12 include/a.txt
3 files, 59 bytes uncompressed, 37 bytes compressed:  37.3%
Kentzo commented 3 years ago

From the documentation on gitattributes:

patterns that match a directory do not recursively match paths inside that directory (so using the trailing-slash path/ syntax is pointless in an attributes file; use path/** instead)

In other words the pattern needs to be changed from dont-include-1/ to dont-include-1/**. My understanding is confirmed by the behavior of git check-attr (which the script relies on to determine whether a file needs to be excluded).

With pattern specified as dont-include-1/:

> git check-attr export-ignore -- dont-include-1 dont-include-1/b.txt dont-include-2 dont-include-2/c.txt include include/a.txt 
dont-include-1: export-ignore: unspecified
dont-include-1/b.txt: export-ignore: unspecified
dont-include-2: export-ignore: set
dont-include-2/c.txt: export-ignore: unspecified
include: export-ignore: unspecified
include/a.txt: export-ignore: unspecified

With pattern specified as dont-include-1/**:

> git check-attr export-ignore -- dont-include-1 dont-include-1/b.txt dont-include-2 dont-include-2/c.txt include include/a.txt
dont-include-1: export-ignore: unspecified
dont-include-1/b.txt: export-ignore: set
dont-include-2: export-ignore: set
dont-include-2/c.txt: export-ignore: unspecified
include: export-ignore: unspecified
include/a.txt: export-ignore: unspecified

Perhaps you should report this discrepancy as a bug to the maintainers of git:

Tested using

> git --version
git version 2.31.1
doxygen-spammer commented 3 years ago

Good point.

I interpret the documentation that git archive can exclude directories, while git check-attr assumes that only files are relevant. The documentation for gitattributes says that matching directories is pointless, but also mentions that directories are relevant for export-ignore (but not for export-subst). Therefore I think that git check-attr should also assume directories to be relevant.

The documentation is at least misleading IMHO, so I am going to report it to git.

Kentzo commented 3 years ago

Please post a link to the report here for reference.

doxygen-spammer commented 3 years ago

Sure https://lore.kernel.org/git/22364178.6Emhk5qWAg@doro/T/#u

qmonnet commented 1 year ago

Hello, did this report lead to a fix? I'm seeing the same in recent versions of the documentation. For what it's worth, I pinned down the commit adding support for excluding files with the dir/ export-ignore syntax in git archive: git/git@5ff247ac0cc9b2d09c8c24bd0c8df12eed94aebc.

It would be nice to have git-archive-all and git archive consistent on that, although looking at the code I see the script checks attributes for each individual file and does not walk through directories, so it's maybe not a trivial change?

doxygen-spammer commented 1 year ago

AFAIK not. I mailed the git mailing list, but without any response.

The problem appears to be that git-archive behaves different than documented, and git-archive-all does it “correct” instead.

Kentzo commented 1 year ago

I doubt they will ever fix this dues to backward compatibility. With enough discrepancies I could add "as git-archive" mode, but there are not complaints for that while, IIRC, you could just change the pattern to work with both.

@qmonnet Maybe bump that thread on kernel.org, see if there is someone to help this time around?

qmonnet commented 1 year ago

Thank you both for your answers! I agree that the change of pattern is trivial and can fix the issue; it's not something big, just that the difference in behaviour (or inconsistencies in Git's docs) can lead to surprise/confusion.

It would be nice to document this difference somewhere in git-archive-all, maybe?

I'll try to bump the thread on the ML, good idea, if I can find the time (won't be this week). [EDIT - The repo I'm interested in changed the pattern in its .gitattributes, I won't be pushing on the ML thread this time.]

brlin-tw commented 11 months ago

I have also encountered these behavior differences, though I would suggest mimicking the git archive command behavior entirely as IMO users are more expect the behavior to be the same with this command, not git check-attr.

In my use case(Git 2.34.1) for the pattern /continuous-integration/:

$ tree continuous-integration/
continuous-integration/
├── create-gitlab-release.sh
...stripped...
$ git-archive-all --dry-run test.tar | grep continuous-integration
/continuous-integration/create-gitlab-release.sh => test/continuous-integration/create-gitlab-release.sh
...stripped...
$ git archive HEAD | tar t | grep continuous-integration
(nothing)
$ git check-attr export-ignore -- continuous-integration
continuous-integration: export-ignore: set
$ git check-attr export-ignore -- continuous-integration/create-gitlab-release.sh
continuous-integration/create-gitlab-release.sh: export-ignore: unspecified

References