How to handle long URLs in commit messages

texhnolyze commented 4 years ago

Expected Behavior

I want to be able to add links with long URLs (over 100 characters) to the footer so that I'll be able to adjust the footer line length in my own rules, while still assuring that the commit message body will be formatted consistently.

Alternatively there should be another way to add long urls to commits without having to minify them, while still adhering to the commit message max-line-length rules.

Current Behavior

It is impossible to create a commit containing a long url (over 100 characters), with the default config without minifying it. I would prefer not to minify the URL as depending on the minification service the minified URL might not be online as long and I could potentially lose context information from the actual URL, which I can only find out by following the minfied one.

Putting a long URL in the body on a single line only works by adjusting the body-max-line-length rule, reducing the usefulness of the configuration.

Example

docs(adl): commit message style convention

asserted by commitlint to [angular commit style] 
https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional#type-enum
according to the conventional commits convention.

Putting the URL as a markdown link in the footer does not work while adjusting footer-max-line-length to Infinity as it is recognized not as the footer, but as part of the body.

Example

docs(adl): commit message style convention

asserted by commitlint to [angular commit style] according to the
conventional commits convention.

[angular commit style]: https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional#type-enum

results in

husky > pre-commit (node v12.18.2)
✔ Preparing...
✔ Running tasks...
✔ Applying modifications...
✔ Cleaning up...
husky > commit-msg (node v12.18.2)
⧗   input: docs(adl): commit message style convention

asserted by commitlint to [angular commit style] according to the
conventional-commits convention

[angular-commit-style]: https://github.com/conventional-changelog/commitlint/tree/master/@commitlint/config-conventional
✖   body's lines must not be longer than 100 characters [body-max-line-length]

✖   found 1 problems, 0 warnings
ⓘ   Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint

husky > commit-msg hook failed (add --no-verify to bypass)

This seems to be due to the conventional-commit-parser package only recognizing the footer by usage of either "BREAKING CHANGE:" or supplying some issue number link "#333". I found this out by saving the above commit message to a test.txt file and running the parser over it resulting in:

[
  {
    "type":"docs",
    "scope":"adl",
    "subject":"commit message style convention",
    "merge":null,
    "header":"docs(adl): commit message style convention",
    "body":"asserted by commitlint to [angular commit style] according to the\n[conventional commits] convention.\n\n[angular commit style]: https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional#type-enum",
    "footer":null,
    "notes":[],
    "references":[],
    "mentions":[],
    "revert":null
  }
]

Affected packages

[x] parse
[x] config-angular

Possible Solution

A solution could be to supply the conventional-commits-parser with a regex for specific noteKeywords. Another solution could be to introduce an option to somehow ignore single line urls in the body of commit messages from the testing of body-max-line-length rule.

But my question would also be how others normally add URLs to commit messges. Do you just always minify them?

escapedcat commented 4 years ago

Hey @texhnolyze ,

hm, interesting usage. So far I never experienced this. Maybe because we rarely add URLs to commits. Or they haven't been over 100 chars. I agreed minifying URLs is kinda bad because as someone who reads the message I would like to see the real URL.

Personally I would just increase body-max-line-length I guess.
I don't see the need to solve this in the main codebase for now. You could try a plugin maybe. Or maybe other will see this issue and jump in. You could also ask in our slack chat how other might handle this or get some feedback.

iamscottcab commented 4 years ago

Here is my 2c for what it's worth.

Forgive me because I am not sure if [angular commit style] is a static string or if that is contextual based on the current commit. If it's static and you are always going to use that then you could easily just pass in the former as a parser option.

module.exports = {
    extends: ['@commitlint/config-conventional'], // My repo is using config-conventional you could use anything...
    rules: {
        'footer-max-line-length': [0, 'always'] // Make sure there is never a max-line-length by disabling the rule
    },
    parserPreset: {
        parserOpts: {
            noteKeywords: ['[angular commit style]'] // Create a custom keyword to distinguish the footer
        }
    }
};

In which case the following parses correctly:

docs(adl): commit message style convention

asserted by commitlint to [angular commit style] according to the conventional commits convention.

[angular commit style]: https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional#type-enum

If [angular commit style] changes then you might need to create a keyword such that you can recognise the start of a footer. Such as link, or links:

module.exports = {
    extends: ['@commitlint/config-conventional'], // My repo is using config-conventional you could use anything...
    rules: {
        'footer-max-line-length': [0, 'always'] // Make sure there is never a max-line-length by disabling the rule
    },
    parserPreset: {
        parserOpts: {
            noteKeywords: ['link:'] // Create a custom keyword to distinguish the footer
        }
    }
};

Then you could just append the word link to your footer whenever you need to list a footer.

Then when you need a link you can simpy do the following:

docs(adl): commit message style convention

asserted by commitlint to [angular commit style] according to the conventional commits convention.

link: [angular commit style]: https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional#type-enum

I would note though in both scenarios I have had to disable the footer line length because that line (without the link keyword appended) is longer than 100 characters anyway. So if you're going to disable line length somewhere I think it's a trade off between how much you want to control the body length vs how much you force a particular commit style for footers.

texhnolyze commented 4 years ago

@iamscottcab Nice that actually helped a lot. I did not realize, that you are able to add parserOpts in your configuration. Yeah [angular commit message] was just an example, which changes depending on the link text I want to use.

Thanks to your examples and looking at the actual implementation of the noteKeywords parser options I constructed this:

module.exports = {
  extends: ['@commitlint/config-conventional'],
  rules: { 'footer-max-line-length': [1, 'always', 100] },
  parserPreset: { parserOpts: { noteKeywords: ['\\[.+\\]:'] } },
}

This is somewhat a workaround due to me realizing, that the parser actually constructs a regex with the list of noteKeywords I pass it. It enables me however to add any link in the format [any text]: to the footer, which is a pattern I do not expect anywhere within the body of a normal commit message. I also opted to still get a warning for longer than 100 character footers so people will have a second look, when it is not necessary.

I'll close this as my issue is solved. Thanks guys!

aspiers commented 3 years ago

IMHO this isn't really resolved, because noteKeywords is undocumented AFAICS.

I think it would be nicer if it could just spot long URLs and understand that they need to be on a single line and therefore an exception to body-max-line-length. The above workaround definitely seems like a case of the human having to uncomfortably adapt to the whims of the machine, whereas good tooling should work the other way around.

escapedcat commented 3 years ago

IMHO this isn't really resolved, because noteKeywords is undocumented AFAICS.

Happy for a PR here

aspiers commented 3 years ago

What does noteKeywords do exactly? Would need to know that before being able to write any docs.

escapedcat commented 3 years ago

Tried to find some info for this.

It's just being handed to the conventional-commits-parser

notekeywords are documented here:
https://github.com/conventional-changelog/conventional-changelog/tree/master/packages/conventional-commits-parser#notekeywords

chrisgarber commented 2 years ago

For posterity could we get a clearly documented summary solution here? The idea is to add noteKeywords? That is turn overrides the rules? My use case doesn't involve putting the link in the footer, does that change any of the above?

dan-turner commented 1 year ago

Absolutely. I would also like to know recommended approach here

sdavids commented 10 months ago

The workaround does not work when one has more than one URL, e.g.:

chore: remove eslint-config-prettier

ESLint deprecated formatting rules so we do not need the plugin anymore

https://eslint.org/blog/2023/10/deprecating-formatting-rules

https://github.com/eslint/eslint/pull/17696

Signed-off-by: Sebastian Davids <sdavids@gmx.de>

I would prefer if either lines with URLs would get ignored by default or if there is an option to opt-in.

And please reopen this issue.

aspiers commented 9 months ago

Let's summarise what we know so far from the above.

Firstly it's clear that this is an issue for more than a small handful of people.

Unsatisfactory workarounds

We have at least three workarounds, but unfortunately they are all really poor:

A workaround based on moving URLs to the footer is far from satisfactory since even if it was documented, it requires users to waste time on configuration and then remember an obscure trick with footers every time they want to mention URLs in commit messages.
Minifying URLs harms readability and also places a brittle dependency on third-party URL redirection services to stay running and maintain all redirects forever.
Increasing body-max-line-length significantly weakens the value of commitlint, since it entirely prevents the use of a helpful linting rule just for the sake of solving an occasional corner case.

Based on the above and the relatively large number of :+1: on my previous comment, we need to find a solution which does the right thing immediately out of the box, rather than forcing users to waste time jumping through hoops.

So IMHO this definitely needs to be addressed within the main codebase.

A user-friendly solution

I can't see any reason why the tool shouldn't simply ignore lines with URLs by default, as @sdavids already suggested:

It should instantly solve the problem for all users whilst not introducing any breaking behaviour.
It should not require the user to configure anything or change their behaviour.
It should not be a complicated change to the parser. I see that there is already a skipEmptyLines() function used for header parsing, so presumably it would be easy to do something similar for skipping URLs when parsing the body.

[EDIT based on below discussion: To clarify, the proposal is to ignore long lines with URLs by default, except for on the first line, since best practice requires keeping the first line as a concise summary of the commit, and putting a long URL on the first line violates this best practice.]

A good first step here would be for the maintainer(s) to state whether they would be open to a PR implementing this solution.

escapedcat commented 9 months ago

Hm, not sure. Isn't the length limit there to make sure a message can be displayed and read in all contexts like on github, in tig, in places where only the first ~70 characters are displayed and everything else is being hidden? If you have long URLs in the subject that kinda defeats the purpose? Maybe this will break the current workflow for a lot of users which do not look into this issue? Could this be an optional flag maybe?

nzbart commented 9 months ago

@escapedcat we normally don't have a choice about the length of the URLs that we're including in the commit message, unless we apply the workaround @aspiers mentioned of minifying URLs. Since URLs shouldn't be line wrapped, there is simply no viable alternative to exceeding the 70 character limit that those tools expect. The question for me is how best to deal with the inevitability of exceeding 70 character lines, and I'd add my vote to simply ignoring lines containing URLs as previously suggested.

escapedcat commented 9 months ago

Currently I'm thinking this should be handled via a plugin if this is possible.

I see that this is an issue for some people but the majority of users will be confused if this behaviour would change. URLs shouldn't be added to the subject but to the body or footer I believe.

aspiers commented 9 months ago

Hm, not sure. Isn't the length limit there to make sure a message can be displayed and read in all contexts like on github, in tig, in places where only the first ~70 characters are displayed and everything else is being hidden?

Not really. Firstly, I don't think GitHub will ever truncate lines and prevent you from reading whole lines. Secondly, tig has :set wrap-lines = yes which I have just tested and indeed does allow reading of full lines.

But anyway, if a git tool permanently hides git data from you, then that is certainly a bug with that tool which needs to be fixed. And IMHO it doesn't makes sense to cripple one tool in order to work around breakage in another tool - that path will lead to misery and despair ;-)

If you have long URLs in the subject that kinda defeats the purpose?

I don't understand this sentence, and I see from this and your more recent comment that maybe I accidentally caused a misunderstanding. I'm assuming that by "subject" you mean the first line of the commit message, right? I don't think it makes sense to have long URLs in the first line; in fact I think it's actively harmful. And from your next comment, it seems you agree :-) But just to make sure we're on the same page, here's my take on this:

The first line should always concisely summarise the commit, and a long URL would be in direct conflict with that.

Consequently the user-friendly solution I'm proposing would ignore long URL lines anywhere except for the first line. Apologies that I didn't explicitly state that before, but hopefully it's clear now.

Maybe this will break the current workflow for a lot of users which do not look into this issue?

I can't think of a single way it would break the current workflow. Please can you give an example?

Could this be an optional flag maybe?

I honestly think that it would be a mistake to not enable this behaviour by default, because a good tool does the most sensible thing out of the box, and for the reasons detailed in my last comment, the user-friendly solution I proposed seems to completely solve the problem whilst not introducing a single additional downside.

But given the choice between it being an option which is off by default, and not having it as an option at all, I'd gladly choose the former.

Currently I'm thinking this should be handled via a plugin if this is possible.

Again, IMHO that would be a mistake because it would impose extra burden on users to set up the plugin when there seems to be no good reason to enable it out of the box.

I see that this is an issue for some people but the majority of users will be confused if this behaviour would change?

Why? Please can you give an example of how they would be confused?

URLs shouldn't be added to the subject but to the body or footer I believe.

As per above I agree 100% with this! The fact that you are mentioning this suggests that I didn't write my proposed solution clearly enough before and as a result you may have misunderstood it. But hopefully it's clear now.

sdavids commented 9 months ago

I also thought that we were talking about the body and footer.

escapedcat commented 9 months ago

@aspiers got it, thanks for clearing this up. At one point in time I felt like people talk about the subject, my fault, sorry.

With github and tig etc I meant the default view, like here. But since this is the subject this is not an issue.

In this case I guess we would be fine with ignoring URLs for body/footer. Still wondering if this would be a breaking change. wdyt?

aspiers commented 9 months ago

Great, thanks! Glad we are aligned now :rocket:

I can't think of any reasons why it would be a breaking change, and I haven't seen anyone else suggest any yet either.

The only scenario I can think of is if someone preferred minified URLs to long URLs and wanted the linter to warn when minification was effectively required. However, as already mentioned above, minified URLs are known to be a bad idea, and this tool is already somewhat opinionated, so personally I don't see an issue here.

That said, if there's a real concern about backwards compatibility then this could easily be solved by adding an option to disable ignoring of long URLs in the body and footer. But even if this option was added, I strongly recommend making the default behaviour to ignore these long URLs, because in most cases, if not all, minification would not be best practice.

escapedcat commented 9 months ago

I can't think of any reasons why it would be a breaking change, and I haven't seen anyone else suggest any yet either.

I'm just afraid of a bigger number of users never looking at this and relying on current behaviour for whatever reason. You never know.

If your up for a PR and maybe for a follow-up PR (if needed) I'm glad to merge it.

aspiers commented 9 months ago

Yeah, it's impossible to know for sure, but it feels like a pretty safe change to me since it would only affect lines containing long URLs in the body and footer, and nothing else.

To be extra safe and correct, we should require that a long URL line is only ignored when no other words appear on the same line, since if there were other words, then this would make the already long line unnecessarily longer. However I think it would be worth allowing whitespace and punctuation, so that Markdown formatting and similar would be accepted. For example the following commit message would be valid:

docs: add something cool blah blah

We're adding something cool to the docs, based on info
from the following pages:   

  - https://example.com/blah/blah/really/really/really/really/really/long/URL
  - https://example.com/blah/blah/another/really/really/really/really/long/URL

Even if it turns out that there is some weird reason why users really want the old behaviour, we could make it optional in a subsequent release, and in the interim period, those users could simply pin their version to the one prior to this change. So even this worst case scenario should not cause any significant issues for anyone.

sw-tracker commented 2 months ago

Side note: is it possible to have something like prettier prettify the body before committing? That way we ensure that our commit messages are within the 100 line-length restriction (not sure if this would fix the long URL problem).

SalahAdDin commented 1 month ago

It is giving us issues, see here.

texhnolyze commented 1 month ago

@aspiers how do you feel about markdown links then? My original intention when I had this issue was to be able to link URLs within the commit message with markdown, as github renders these even in commit messages.

So either with inline links:

feat(subject): added feature

for more context look [here](https://google.com)

or references:

feat(subject): added feature

for more context look [here]

[here]: https://google.com

I personally find this cleaner, which is why I opted for the footer workaround. It is true, that this also leads to issues when linking multiple URLs, even with the footer workaround.

Do you think the following rules would make sense:

lines can be longer than the max length, when they contain URLs under the following conditions
- matches regex ^[\s#-]* followed by URL for single line URLS, but supporting formatting
- matches inline markdown links regex \[[\w- ]+\] followed by URL in ()
- matches markdown link refrences regex ^\[[\w- ]+\]: followed by URL

I think that way would make the most sense personally, but I can see that this could be a breaking change. And of course when introducing a breaking change, one could argue to just ignore any line with a URL in it.

sdavids commented 1 month ago

just ignore any line with a URL in it.

I would prefer just that—every body/footer line with an URL is ignored.

CommonMark

*
-
+
1.
2)

AciiDoc

*
.

Pandoc

#.

Maybe even:

>
#

The CommonMark link syntax is more complicated than you think, e.g. *[foo*](url) is allowed.

AciiDoc uses a different syntax.

Going down this route and would lead to "Why do you support their syntax but not mine 😤" …

FelixZY commented 1 month ago

I'm not sure I like "ignore every line that contains a link" - there can be cases like:

I have a short link https://bit.ly/abc but this trailing text now exceeds the max line width - whatever should I do?

I tend to use markdown links when writing some commit messages. I have not checked if they render right in GitHub but this is what I typically do:

This is a very long line with a long [link](
  https://example.com/some/path/that/is/very/long/and/will/exceed/the/allotted/body/width
) and some text after.

Update

Splitting the link to a new line does not render neatly in GitHub.

aspiers commented 1 month ago

@texhnolyze I agree it's a good idea to look for a solution which supports the various formats of linking which Markdown supports.

That said, I also agree with @sdavids that only supporting Markdown formats and nothing else is problematic. So I'm inclined to agree that maybe the best solution is indeed to ignore every line that contains a URL.

However we first need to make sure we can address @FelixZY's concerns:

I'm not sure I like "ignore every line that contains a link" - there can be cases like:
I have a short link https://bit.ly/abc but this trailing text now exceeds the max line width - whatever should I do?

In this case surely the line can be wrapped:

I have a short link https://bit.ly/abc but this trailing
text no longer exceeds the max line width - wrapping is
what I should do.

After all, keeping lines short where possible is the whole point of the rule. If someone doesn't want to keep lines short, they can simply disable the rule.

I tend to use markdown links when writing some commit messages. I have not checked if they render right in GitHub

According to @texhnolyze's comment they do render right.

but this is what I typically do:
This is a very long line with a long [link](
  https://example.com/some/path/that/is/very/long/and/will/exceed/the/allotted/body/width
) and some text after.
Update

Splitting the link to a new line does not render neatly in GitHub.

You would have multiple options in this case, for example:

This is a very long line with a long [link](https://example.com/some/path/that/is/very/long/and/will/exceed/the/allotted/body/width)
and some text after.

or

This is a very long line with a long link:

- https://example.com/some/path/that/is/very/long/and/will/exceed/the/allotted/body/width

and some text after.

or

This is a very long line with a long [link][] and some text after.

[link]: https://example.com/some/path/that/is/very/long/and/will/exceed/the/allotted/body/width

Summary

I currently believe that the best solution is to ignore every line that contains a URL.

FelixZY commented 1 month ago

After all, keeping lines short where possible is the whole point of the rule. If someone doesn't want to keep lines short, they can simply disable the rule.

I guess I'm happy as long as the rule triggers if the line could have been wrapped.

aspiers commented 1 month ago

@FelixZY commented on August 15, 2024 10:12 PM:

I guess I'm happy as long as the rule triggers if the line could have been wrapped.

Nice idea. I just finished an implementation, and I will see if I can tweak it to do that before submitting a PR.

aspiers commented 1 month ago

Hrm, but how can we detect whether the line could have been wrapped?

For example, if the URL is inside a nested Markdown list:

- first list item
    - second list item
        - some text: https://example.com/now/here/is/a/longish/URL

Let's assume that this line with the URL currently exceeds the maximum line length configured by the rule. How do we know whether it should have been wrapped? If we somehow knew that it's Markdown format, and we had a built-in Markdown parser, we would be able to figure out that it could be written as:

- first list item
    - second list item
        - some text:
          https://example.com/now/here/is/a/longish/URL

and maybe that would now be under the maximum line length without changing how the nested list is rendered.

But there's no way of knowing that it's Markdown format, and as @sdavids already observed, it's certainly not realistic to attempt to implement parsers for every possible format which might be used.

So I don't think there's any good way of achieving your suggestion.

FelixZY commented 1 month ago

I think i meant it more like "if the line could have been wrapped after the url", meaning that you would only raise an error if the url is followed by a whitespace and non-punctuation characters.

So this would not be an error:

- first list item
    - second list item
        - some text: https://example.com/now/here/is/a/longish/URL
        - some more [text](https://example.com/now/here/is/a/longish/URL)

But this would:

- first list item
    - second list item
        - some text: https://example.com/now/here/is/a/longish/URL and trailing text

aspiers commented 1 month ago

That's a nice idea! However ideally this would also fail the linter:

- first list item
    - second list item
        - a long sentence with a lot of text before the URL and nothing after: https://example.com/now/here/is/a/longish/URL

This is an example of an important point you made which I missed in one of my previous comments (August 15, 2024 10:08 PM):

However we first need to make sure we can address @FelixZY's concerns:
I'm not sure I like "ignore every line that contains a link" - there can be cases like:
I have a short link https://bit.ly/abc but this trailing text now exceeds the max line width - whatever should I do?
In this case surely the line can be wrapped:
I have a short link https://bit.ly/abc but this trailing
text no longer exceeds the max line width - wrapping is
what I should do.
After all, keeping lines short where possible is the whole point of the rule. If someone doesn't want to keep lines short, they can simply disable the rule.

This missed your point that in this case the rule should flag that the line needs to be wrapped. It's not good enough to just expect the user to notice that it should be wrapped, because that's the whole point of having a linter. Sorry about that!

Anyway, so the question is whether we can implement a rule which does the "right" thing in all of the above situations.

I have an idea for a possible solution. Instead of "ignore every line that contains a link", we could pretend that any URL seen over x characters long is only x characters long.

For example, let's assume that the max line length is configured to be 80 characters. If we take x as 70, this allows any URL of 70 characters or longer to have 10 other characters (including spaces) on the same line. In other words, that provides a small "safety margin" of 10 characters in addition to the URL's length, which can be used for rich-text formatting, footnote suffices etc. but still prevents the user being given free rein to make the line as long as they want just because it has a URL on it. So all of the following lines would pass:

- URL: https://foo.bar.example.com/this/is/a/long/path/in/a/URL/78/chars/long/in/total

https://foo.bar.example.com/this/is/a/long/path/in/a/URL/74/chars/long/in/total (some
text after the long URL, but only one word on the same line)

Here's a [reference-style link][foobar].

[foobar]: https://foo.bar.example.com/this/is/a/long/path/in/a/URL/74/chars/long/in/total

but the following lines would fail:

Here's a short URL https://short.url.com/ within a line which goes over the maximum length and could have been wrapped.

This line exhausts the safety margin https://foo.bar.example.com/this/is/a/long/path/in/a/URL/74/chars/long/in/total

Follow-on question is whether this value of x would need to be configurable, or whether hardcoding to something like 10 would be good enough.

sdavids commented 1 month ago

You could just ignore the URL and count the other characters on the line.

01234567890123456789
abcde http://ignore.me abc ❌

⇓

01234567890123456789
abcde  abc                 ✅

With CommonMark [text](url) and AsciiDoc url[text] markup only url would be ignored, i.e. not counted.

aspiers commented 1 month ago

That would cause examples like these to pass when we want them to fail:

Here's a short URL https://short.url.com/ within a line which goes slightly over the maximum length.

This line with URL https://foo.bar.example.com/this/is/a/long/path/in/a/URL/74/chars/long/in/total far exceeds
the safety margin and could have been more aggressively wrapped to
minimize how much it goes over the maximum.

I think the algorithm has to take into account a) the length of the URL and b) the number of other characters on the same line as the URL, otherwise it won't be able to handle all these cases correctly.

FelixZY commented 1 month ago

@aspiers I don't like that the user is able to add a word after the url if that word goes outside the line limit. Couldn't you do something like:

// Pseudo
foreach (line in commitMessage) {
  if (line.length < limit) continue;

  // [
  //  "Lorem ipsum dolor [sit](",
  //  "https://example.com", 
  //  ") amet ",
  //  "https://example.com/2"
  // ]
  parts = line.split(urlPattern)

  len = 0
  foreach (part in parts) {
    len += part.length

    if (len >= limit && !part.isUrl && part.containsRegex(/\w/)) {
      return false
    }
  }
}
return true

aspiers commented 1 month ago

I think there are valid use cases for having text after the URL, e.g.

a footnote marker like [*] or [1]
some rich-text formats like asciidoc (as already mentioned) and Org put the link description text after the URL, rather than before; here's [[https://orgmode.org/worg/org-syntax.html#orgcf3f3fa][the previous link in Org format]]

Also that logic doesn't take into account the length of any URLs on the line, so even with a single URL on the line it could go substantially over limit due to allowing a lot of non-URL words. And it would allow multiple URLs, meaning that total line length is effectively unlimited.

I can code my proposal either tomorrow or this weekend, with some test cases covering all the scenarios we discussed. Then if someone else can propose a better algorithm where all those test cases still pass, I'm all ears :-)

FelixZY commented 1 month ago

Also that logic doesn't take into account the length of any URLs on the line, so even with a single URL on the line it could go substantially over limit due to allowing a lot of non-URL words.

My pseudo code adds the length of the URL to len so it is indeed taken into account.

[[https://orgmode.org/worg/org-syntax.html#orgcf3f3fa][the previous link in Org format]]

I see your point regarding such formats. However, I can't see a clean solution for such cases aside from simply ignoring any line containing URLs. The above example would fail your padding method as well, no?

Perhaps there needs to be two modes:

"smart" (which does what my pseudo code above does) and
"ignore" (which ignores any lines found to contain a URL)

sdavids commented 1 month ago

Using an URL shortener is from a security standpoint a bad idea.

Also, one is at the whim of a third party—commit messages cannot be updated (without history rewriting).

UUID

01234567890123456789012345678901234567890123456789
http://a.de/919108f7-52d1-4320-9bac-f847db4148a8 ⇒ 48 characters
http://a.de/919108f752d143209bacf847db4148a8     ⇒ 44 characters

Snowflake ID

01234567890123456789012345678901234567890123456789
http://a.de/1822511705628868608 ⇒ 31 characters

Nano ID (can be configured to be shorter)

01234567890123456789012345678901234567890123456789
http://a.de/V1StGXR8_Z5jdHi6B-myT ⇒ 33 characters

Cuid2 (can be configured to be shorter)

01234567890123456789012345678901234567890123456789
http://a.de/tz4a98xxat96iws9zmbrgj3a ⇒ 36 characters

ULID

01234567890123456789012345678901234567890123456789
http://a.de/01ARZ3NDEKTSV4RRFFQ69G5FAV ⇒ 38 characters

So we are talking about hand-crafted URLs, a page having a short title thereby getting a short slug, or sites using numeric IDs (maybe converted to hex to make them shorter).

I am not sure, if it is worth handling these rare cases and/or introducing another option to handle those.

sdavids commented 1 month ago

What we should handle though is International Domain Names (puny code).

https://münchen.de

⇓

https://xn--mnchen-3ya.de

Regular Expressions Cookbook - Validating Domain Names

FelixZY commented 1 month ago

@sdavids I don't believe using a URL shortener or not supporting international domain names have been suggested?

aspiers commented 1 month ago

Right. I'm strongly against URL shorteners being considered as a valid solution, as stated in this previous comment.

I'm also aware that we need to support international URLs and other edge cases. The code I've already written uses a deliberately lax regular expressions for exactly that reason.

sdavids commented 1 month ago

My comment about IDNs was a reminder that they exist—a lot of people forget about them 😉

Without an URL shortener you will almost always have long URLs—and yes "long" is in the eye of the beholder (depending on your sensibility the link may be NSFW).

That's why I suggest going with a simple solution (ignore line completely or ignore characters of URL).

Introducing yet another len option/consideration would increase the complexity of the implementation and would need a lot more tests, i.e. a lot of new code to maintain.

Yes, we are programmers and we like 100% solutions, edge cases, and complexity but sometimes pareto is OK too 😉

dan-turner commented 1 month ago

Right. I'm strongly against URL shorteners being considered as a valid solution, as stated in this previous comment.

I'm also aware that we need to support international URLs and other edge cases. The code I've already written uses a deliberately lax regular expressions for exactly that reason.

I cannot believe a URL shortener was even remotely considered for a second 🙈.

I get that this package is all about enforcing strong opinions for the sake of encouraging quality commit messages, but this is madness. A quality commit also needs the ability to include relevant context information, of which sometimes that is long URLs. If we go back to first principles here, the only sane option here is to find a solution that makes an exception for the rare situations where it's needed to include long URLs.

aspiers commented 1 month ago

Violently agree @dan-turner! :grin: I said the same thing above, although the issue is very long now.

@sdavids wrote:

The comment about IDNs was a reminder that they exist—a lot of people forget about them 😉

Indeed! There are so many corner cases in URLs, not just IDNs, but also IP addresses, usernames/passwords etc.

Without an URL shortener you will almost always have long URLs—and yes "long" is in the eye of the beholder (depending on your sensibility the link may be NSFW).

:rofl: :rofl: :rofl:

That's why I suggest going with a simple solution (ignore line completely or ignore characters of URL).

Introducing yet another len option/consideration would increase the complexity of the implementation and would need a lot more tests, i.e. a lot of new code to maintain.

Yes, we are programmers and we like 100% solutions, edge cases, and complexity but sometimes pareto is OK too 😉

I think this is an excellent point. I already have the code which just completely ignores any line with a URL, and it works great. However I was trying to go one step further and address @FelixZY's concerns too, but it does start to get tricky quite quickly. Continuing the previous solution design conversation may be further evidence of that:

@FelixZY commented on Aug 16, 2024, 1:38 AM GMT+1:

Also that logic doesn't take into account the length of any URLs on the line, so even with a single URL on the line it could go substantially over limit due to allowing a lot of non-URL words.

My pseudo code adds the length of the URL to len so it is indeed taken into account.

Apologies! I misread your code.

[[https://orgmode.org/worg/org-syntax.html#orgcf3f3fa][the previous link in Org format]]
I see your point regarding such formats. However, I can't see a clean solution for such cases aside from simply ignoring any line containing URLs.

The "safety margin" approach I proposed above should handle it fine, but it would most likely lead to another configuration option for how much of a safety margin to allow.

The above example would fail your padding method as well, no?

It would, but my method would allow wrapping like this:

[[https://orgmode.org/worg/org-syntax.html#orgcf3f3fa][the
previous link in Org format]]

IIUC, the approach described by your code wouldn't allow the trailing ][the if the text prior to that already exceeded the max line length.

Perhaps there needs to be two modes:

"smart" (which does what my pseudo code above does) and

"ignore" (which ignores any lines found to contain a URL)

Could be. Again that comes down to a choice between simplicity and the quest for perfection. Ultimately the maintainers of this repo would be the ones to decide which they prefer. I think I'll just submit a PR for the simple solution for now, since:

I have it coded already
it's super easy to understand
it doesn't require any new configuration options
a more sophisticated approach could be built on top later

FelixZY commented 1 month ago

IIUC, the approach described by your code wouldn't allow the trailing][the if the text prior to that already exceeded the max line length.

That's correct. However, my code would allow ][.

IMO, this is bad:

[[https://orgmode.org/worg/org-syntax.html#orgcf3f3fa][the
previous link in Org format]]

whereas this is fine:

[[https://orgmode.org/worg/org-syntax.html#orgcf3f3fa][
the previous link in Org format]]

The reason for this is that the trailing word could be e.g. "not", which would invert the meaning of any following text.

As I see it, the reason for a line length limit is readability - you should not need to scroll horizontally to read and understand the full message. URLs may by their very nature be longer than the line length limit, which might render like this:

Here's my commit message. It's very good
and descriptive. Also: https://example.>
has good information.

However, I think that's fine given that you are still able to understand that there's a URL there. If we start allowing trailing words however, then you will still need to do the horizontal scroll to make sure you have read the full message.

aspiers commented 1 month ago

I agree with all those points; however it still leaves the problem of the example in this comment:

- first list item
    - second list item
        - a long sentence with a lot of text before the URL and nothing after: https://example.com/now/here/is/a/longish/URL

which would pass the check. That's definitely undesirable as per your original request:

I guess I'm happy as long as the rule triggers if the line could have been wrapped.

FelixZY commented 1 month ago

That's definitely undesirable as per your original request:

I guess I'm happy as long as the rule triggers if the line could have been wrapped.

I think we're talking about different things here - my intended meaning with this was that I think just ignoring any line that contains a URL is a bad idea. If we have a line which contains a URL as well as trailing text, I'd want the rule to trigger for the trailing text if the combined width exceeds the specified limit.

That's definitely undesirable

I'm not sure I agree that your example is "problematic" per se. Sure, it seems a bit strange to put the URL at the end but as long as the URL starts within the line length limit, I think I'm fine with it. If we knew that the commit message would consist of e.g. markdown syntax, we would be able to more intelligently determine that the URL can be moved to the next line. However, without such knowledge, I think simply allowing it is the best we can do.

If you do want to address it, I would suggest the rule should rather check if "enough" of the URL is visible in the current line so that a reader can clearly see it is a URL. Going back to pseudocode as I think that makes things the most clear:

 // Pseudo
 foreach (line in commitMessage) {
   if (line.length < limit) continue;

   // [
   //  "Lorem ipsum dolor [sit](",
   //  "https://example.com", 
   //  ") amet ",
   //  "https://example.com/2"
   // ]
   parts = line.split(urlPattern)

   len = 0
   foreach (part in parts) {
     len += part.length

-    if (len >= limit && !part.isUrl && part.containsRegex(/\w/)) {
+    if (len >= limit && (
+        !part.isUrl && part.containsRegex(/\w/) ||
+        part.isUrl && len - part.length > limit - "https://".length
+    )) {
       return false
     }
   }
 }
 return true

aspiers commented 1 month ago

Yeah, it comes down to what we mean by "could have been wrapped". As you say, if there's trailing text then it's fairly clear that there's a potential for wrapping, but if the text is only preceding the URL then it's far less clear. In fact, this is the same point I already made, which I'd somewhat forgotten about in the course of this lengthy and nuanced discussion.

Thanks for the updated pseudo-code. For the reasons already stated, I'm going to just submit the simple "ignore any line with a URL" approach for now, and then if the maintainers have an appetite for it, we can potentially build a more sophisticated solution on top of that. Your updated approach certainly sounds reasonable to me.

conventional-changelog / commitlint