Open newren opened 4 days ago
/submit
Submitted as pull.1831.git.1732557520428.gitgitgadget@gmail.com
To fetch this version into FETCH_HEAD
:
git fetch https://github.com/gitgitgadget/git/ pr-1831/newren/disallow-dotdot-fast-import-v1
To fetch this version to local tag pr-1831/newren/disallow-dotdot-fast-import-v1
:
git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1831/newren/disallow-dotdot-fast-import-v1
On the Git mailing list, Eric Sunshine wrote (reply to this):
On Mon, Nov 25, 2024 at 12:58 PM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> If a user specified e.g.
> M 100644 :1 ../some-file
> then fast-import previously would happily create a git history where
> there is a tree in the top-level directory named "..", and with a file
> inside that directory named "some-file". The top-level ".." directory
> causes problems. While git checkout will die with errors and fsck will
> report hasDotdot problems, the user is going to have problems trying to
> remove the problematic file. Simply avoid creating this bad history in
> the first place.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> @@ -1466,6 +1466,9 @@ static int tree_content_set(
> e->name = to_atom(p, n);
> + if (!strcmp(e->name->str_dat, ".") || !strcmp(e->name->str_dat, "..")) {
> + die("path %s contains invalid component", p);
> + }
Probably not worth a reroll, but is_dot_or_dotdot() might be usable here.
(And -- style nit -- the braces could be dropped.)
User Eric Sunshine <sunshine@sunshineco.com>
has been added to the cc: list.
On the Git mailing list, Elijah Newren wrote (reply to this):
On Mon, Nov 25, 2024 at 10:15 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Mon, Nov 25, 2024 at 12:58 PM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > If a user specified e.g.
> > M 100644 :1 ../some-file
> > then fast-import previously would happily create a git history where
> > there is a tree in the top-level directory named "..", and with a file
> > inside that directory named "some-file". The top-level ".." directory
> > causes problems. While git checkout will die with errors and fsck will
> > report hasDotdot problems, the user is going to have problems trying to
> > remove the problematic file. Simply avoid creating this bad history in
> > the first place.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> > @@ -1466,6 +1466,9 @@ static int tree_content_set(
> > e->name = to_atom(p, n);
> > + if (!strcmp(e->name->str_dat, ".") || !strcmp(e->name->str_dat, "..")) {
> > + die("path %s contains invalid component", p);
> > + }
>
> Probably not worth a reroll, but is_dot_or_dotdot() might be usable here.
>
> (And -- style nit -- the braces could be dropped.)
Good catches, thanks. I think they are worth a reroll; I'll send one in.
User Elijah Newren <newren@gmail.com>
has been added to the cc: list.
/submit
Submitted as pull.1831.v2.git.1732561248717.gitgitgadget@gmail.com
To fetch this version into FETCH_HEAD
:
git fetch https://github.com/gitgitgadget/git/ pr-1831/newren/disallow-dotdot-fast-import-v2
To fetch this version to local tag pr-1831/newren/disallow-dotdot-fast-import-v2
:
git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1831/newren/disallow-dotdot-fast-import-v2
This patch series was integrated into seen via https://github.com/git/git/commit/0a88e9e8210ee1d7139680d8ecd44d45f1f81a20.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Mon, Nov 25, 2024 at 07:00:48PM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> If a user specified e.g.
> M 100644 :1 ../some-file
> then fast-import previously would happily create a git history where
> there is a tree in the top-level directory named "..", and with a file
> inside that directory named "some-file". The top-level ".." directory
> causes problems. While git checkout will die with errors and fsck will
> report hasDotdot problems, the user is going to have problems trying to
> remove the problematic file. Simply avoid creating this bad history in
> the first place.
Makes sense.
More generally this made me wonder whether we should maybe extract some
bits out of "fsck.c" so that we don't have to duplicate the checks done
there in git-fast-import(1). This would for example include checks for
".git" and its HFS/NTFS variants as well as tree entry length checks for
names longer than 4096 characters.
This of course does not have to be part of your patch, which looks good
to me.
Thanks!
Patrick
User Patrick Steinhardt <ps@pks.im>
has been added to the cc: list.
This patch series was integrated into seen via https://github.com/git/git/commit/7ccbb69c16804a8a084a9ccc64e5ab90c6eacb31.
This patch series was integrated into next via https://github.com/git/git/commit/8b145bb54386aff918e691857e0d791314661933.
On the Git mailing list, "Kristoffer Haugsbakk" wrote (reply to this):
Hi. I see that this is in `next` now so the following might
be irrelevant.
On Mon, Nov 25, 2024, at 20:00, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> [...]
> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> index 76d5c20f141..995ef76f9d6 100644
> --- a/builtin/fast-import.c
> +++ b/builtin/fast-import.c
> @@ -1466,6 +1466,8 @@ static int tree_content_set(
> root->tree = t = grow_tree_content(t, t->entry_count);
> e = new_tree_entry();
> e->name = to_atom(p, n);
> + if (is_dot_or_dotdot(e->name->str_dat))
> + die("path %s contains invalid component", p);
Nit: single-quoting the path seems more common:
$ git grep "\"path '%s'" ':!po/' | wc -l
17
$ git grep "\"path %s" ':!po/' | wc -l
4
> e->versions[0].mode = 0;
> oidclr(&e->versions[0].oid, the_repository->hash_algo);
> t->entries[t->entry_count++] = e;
> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 6224f54d4d2..caf3dc003a0 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -522,6 +522,26 @@ test_expect_success 'B: fail on invalid committer (5)' '
> test_must_fail git fast-import <input
> '
>
> +test_expect_success 'B: fail on invalid file path' '
> + cat >input <<-INPUT_END &&
> + blob
> + mark :1
> + data <<EOF
> + File contents
> + EOF
> +
> + commit refs/heads/badpath
> + committer Name <email> $GIT_COMMITTER_DATE
> + data <<COMMIT
> + Commit Message
> + COMMIT
> + M 100644 :1 ../invalid-path
Maybe the test could be parameterized so that both `..` and `.` can
be tested? Like in `test_path_eol_success`.
--
Kristoffer Haugsbakk
User "Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com>
has been added to the cc: list.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes:
>> + if (is_dot_or_dotdot(e->name->str_dat))
>> + die("path %s contains invalid component", p);
>
> Nit: single-quoting the path seems more common:
>
> $ git grep "\"path '%s'" ':!po/' | wc -l
> 17
> $ git grep "\"path %s" ':!po/' | wc -l
> 4
Ah, I missed that one. Thanks for catching.
We probably should write it down.
--- >8 ---
[PATCH] CodingGuidelines: a handful of error message guidelines
It is more efficient to have something in the coding guidelines
document to point at, when we want to review and comment on a new
message in the codebase to make sure it "fits" in the set of
existing messages.
Let's write down established best practice we are aware of.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
* I am writing what I think is the established practice from
memory; clarifications, corrections, and additions are all
welcome.
Documentation/CodingGuidelines | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
index 87904791cb..0444391983 100644
--- c/Documentation/CodingGuidelines
+++ w/Documentation/CodingGuidelines
@@ -703,16 +703,22 @@ Program Output
Error Messages
- - Do not end error messages with a full stop.
+ - Do not end a single-sentence error message with a full stop.
- Do not capitalize the first word, only because it is the first word
- in the message ("unable to open %s", not "Unable to open %s"). But
+ in the message ("unable to open '%s'", not "Unable to open '%s'"). But
"SHA-3 not supported" is fine, because the reason the first word is
capitalized is not because it is at the beginning of the sentence,
but because the word would be spelled in capital letters even when
it appeared in the middle of the sentence.
- - Say what the error is first ("cannot open %s", not "%s: cannot open")
+ - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
+
+ - Enclose the subject of an error inside a pair of single quotes,
+ e.g. `die(_("unable to open '%s'"), path)`.
+
+ - Unless there is a compelling reason not to, error messages should
+ be marked for `_("translation")`.
Externally Visible Names
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Nov 26, 2024 at 07:57:57AM +0100, Patrick Steinhardt wrote:
> On Mon, Nov 25, 2024 at 07:00:48PM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > If a user specified e.g.
> > M 100644 :1 ../some-file
> > then fast-import previously would happily create a git history where
> > there is a tree in the top-level directory named "..", and with a file
> > inside that directory named "some-file". The top-level ".." directory
> > causes problems. While git checkout will die with errors and fsck will
> > report hasDotdot problems, the user is going to have problems trying to
> > remove the problematic file. Simply avoid creating this bad history in
> > the first place.
>
> Makes sense.
>
> More generally this made me wonder whether we should maybe extract some
> bits out of "fsck.c" so that we don't have to duplicate the checks done
> there in git-fast-import(1). This would for example include checks for
> ".git" and its HFS/NTFS variants as well as tree entry length checks for
> names longer than 4096 characters.
I had the same thought, but I think the right code to be using is
verify_path(). That's what ultimately is used to let names into the
index from trees, from update-index, or from other tools like git-apply.
So I'd consider that authoritative, and fsck is mostly trying to follow
those rules while looking at only a single tree at a time. But
fast-import should have the whole path as a string, just like the index
code does).
-Peff
User Jeff King <peff@peff.net>
has been added to the cc: list.
On the Git mailing list, Eric Sunshine wrote (reply to this):
On Wed, Nov 27, 2024 at 8:23 AM Junio C Hamano <gitster@pobox.com> wrote:
> [PATCH] CodingGuidelines: a handful of error message guidelines
>
> It is more efficient to have something in the coding guidelines
> document to point at, when we want to review and comment on a new
> message in the codebase to make sure it "fits" in the set of
> existing messages.
>
> Let's write down established best practice we are aware of.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines
> @@ -703,16 +703,22 @@ Program Output
> Error Messages
>
> - - Do not end error messages with a full stop.
> + - Do not end a single-sentence error message with a full stop.
>
> - Do not capitalize the first word, only because it is the first word
> - in the message ("unable to open %s", not "Unable to open %s"). But
> + in the message ("unable to open '%s'", not "Unable to open '%s'"). But
> "SHA-3 not supported" is fine, because the reason the first word is
> capitalized is not because it is at the beginning of the sentence,
> but because the word would be spelled in capital letters even when
> it appeared in the middle of the sentence.
>
> - - Say what the error is first ("cannot open %s", not "%s: cannot open")
> + - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
> +
> + - Enclose the subject of an error inside a pair of single quotes,
> + e.g. `die(_("unable to open '%s'"), path)`.
These changes all seem fine.
> + - Unless there is a compelling reason not to, error messages should
> + be marked for `_("translation")`.
We might want to spell this out more fully, such as stating that
messages from porcelain commands should be marked for translation, but
messages in plumbing should not. Also, perhaps mention explicitly that
BUG("message") should not be marked for translation since they are
intended to be read by Git developers, not by end-users.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Jeff King <peff@peff.net> writes:
> I had the same thought, but I think the right code to be using is
> verify_path(). That's what ultimately is used to let names into the
> index from trees, from update-index, or from other tools like git-apply.
Yeah, I agree that is the right helper to use.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Taking input from comments by Eric (thanks) on the previous round,
this iteration adds a bit more about Porcelain/Plumbing and BUG().
diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 71e4742fd5..2b8f99f333 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -703,8 +703,15 @@ Error Messages
- Enclose the subject of an error inside a pair of single quotes,
e.g. `die(_("unable to open '%s'"), path)`.
- - Unless there is a compelling reason not to, error messages should
- be marked for `_("translation")`.
+ - Unless there is a compelling reason not to, error messages from the
+ Porcelain command should be marked for `_("translation")`.
+
+ - Error messages from the plumbing commands are sometimes meant for
+ machine consumption and should not be marked for `_("translation")`
+ to keep them 'grep'-able.
+
+ - BUG("message") are for communicating the specific error to
+ developers, and not to be translated.
Externally Visible Names
--- >8 ---
It is more efficient to have something in the coding guidelines
document to point at, when we want to review and comment on a new
message in the codebase to make sure it "fits" in the set of
existing messages.
Let's write down established best practice we are aware of.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Documentation/CodingGuidelines | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 3263245b03..2b8f99f333 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -689,16 +689,29 @@ Program Output
Error Messages
- - Do not end error messages with a full stop.
+ - Do not end a single-sentence error message with a full stop.
- Do not capitalize the first word, only because it is the first word
- in the message ("unable to open %s", not "Unable to open %s"). But
+ in the message ("unable to open '%s'", not "Unable to open '%s'"). But
"SHA-3 not supported" is fine, because the reason the first word is
capitalized is not because it is at the beginning of the sentence,
but because the word would be spelled in capital letters even when
it appeared in the middle of the sentence.
- - Say what the error is first ("cannot open %s", not "%s: cannot open")
+ - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
+
+ - Enclose the subject of an error inside a pair of single quotes,
+ e.g. `die(_("unable to open '%s'"), path)`.
+
+ - Unless there is a compelling reason not to, error messages from the
+ Porcelain command should be marked for `_("translation")`.
+
+ - Error messages from the plumbing commands are sometimes meant for
+ machine consumption and should not be marked for `_("translation")`
+ to keep them 'grep'-able.
+
+ - BUG("message") are for communicating the specific error to
+ developers, and not to be translated.
Externally Visible Names
--
2.47.1-499-g8536fed62d
This branch is now known as en/fast-import-path-sanitize
.
This patch series was integrated into seen via https://github.com/git/git/commit/66d1ef342ef0bf6b5d01af3246240b7093542c0b.
This patch series was integrated into seen via https://github.com/git/git/commit/abced81afdb8c7bd65086c4b75cd5dc9c7284e35.
On the Git mailing list, Eric Sunshine wrote (reply to this):
On Wed, Nov 27, 2024 at 7:36 PM Junio C Hamano <gitster@pobox.com> wrote:
> It is more efficient to have something in the coding guidelines
> document to point at, when we want to review and comment on a new
> message in the codebase to make sure it "fits" in the set of
> existing messages.
>
> Let's write down established best practice we are aware of.
>
> Helped-by: Eric Sunshine <sunshine@sunshineco.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> @@ -689,16 +689,29 @@ Program Output
> Error Messages
>
> - - Say what the error is first ("cannot open %s", not "%s: cannot open")
> + - Say what the error is first ("cannot open '%s'", not "%s: cannot open").
> +
> + - Enclose the subject of an error inside a pair of single quotes,
> + e.g. `die(_("unable to open '%s'"), path)`.
> +
> + - Unless there is a compelling reason not to, error messages from the
> + Porcelain command should be marked for `_("translation")`.
Here you capitalize "Porcelain" but below, "plumbing" is all lowercase.
> + - Error messages from the plumbing commands are sometimes meant for
> + machine consumption and should not be marked for `_("translation")`
> + to keep them 'grep'-able.
Using the same example, `_("translation")`, for both the "should be"
and "should not be" cases may very well confuse readers. (It certainly
confused me.) Perhaps mirroring the example of an item earlier in the
list would be clearer:
- Unless there is a compelling reason not to, error messages from
porcelain commands should be marked for translation, e.g.
`die(_("bad revision"))`
- Error messages from plumbing commands are sometimes meant for
machine consumption, thus should not be marked for translation,
e.g. `die("bad revision")`
> + - BUG("message") are for communicating the specific error to
> + developers, and not to be translated.
Okay, although could be slightly more explicit:
- BUG("message") is for communicating a specific failure to
developers, not end-users, thus should not be translated.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Eric Sunshine <sunshine@sunshineco.com> writes:
>> + - Unless there is a compelling reason not to, error messages from the
>> + Porcelain command should be marked for `_("translation")`.
>
> Here you capitalize "Porcelain" but below, "plumbing" is all lowercase.
;-) I think that is how we spell them in our documentation when we
contrast them against each other.
>> + - Error messages from the plumbing commands are sometimes meant for
>> + machine consumption and should not be marked for `_("translation")`
>> + to keep them 'grep'-able.
>
> Using the same example, `_("translation")`, for both the "should be"
> and "should not be" cases may very well confuse readers. (It certainly
> confused me.) Perhaps mirroring the example of an item earlier in the
> list would be clearer:
>
> - Unless there is a compelling reason not to, error messages from
> porcelain commands should be marked for translation, e.g.
> `die(_("bad revision"))`
>
> - Error messages from plumbing commands are sometimes meant for
> machine consumption, thus should not be marked for translation,
> e.g. `die("bad revision")`
Thanks, that is much better. Let me steal it verbatim in the
hopefully final reroll.
>> + - BUG("message") are for communicating the specific error to
>> + developers, and not to be translated.
>
> Okay, although could be slightly more explicit:
>
> - BUG("message") is for communicating a specific failure to
> developers, not end-users, thus should not be translated.
The way I read your rewrite is that the "communitation" mentioned is
between the program and the user who saw the message. I wanted to
say that the message is seen first by an end-user, and then is
communicated to developers. And not translating is one way to make
sure the message is not mangled, and stays grep-able, during the
game of telephone.
Would this work better?
- In order to help the user who saw BUG("message") to accurately
communicate it to developers, do not mark them for translation.
Thanks.
Changes since v1:
cc: Eric Sunshine sunshine@sunshineco.com cc: Patrick Steinhardt ps@pks.im cc: "Kristoffer Haugsbakk" kristofferhaugsbakk@fastmail.com cc: Jeff King peff@peff.net