codeigniter4 / CodeIgniter4

Open Source PHP Framework (originally from EllisLab)
https://codeigniter.com/
MIT License
5.37k stars 1.9k forks source link

Bug: auto_link() fails to include trailing slashes in URLs #9165

Closed corenominal closed 1 month ago

corenominal commented 2 months ago

PHP Version

8.2

CodeIgniter4 Version

4.5.4

CodeIgniter4 Installation Method

Composer (using codeigniter4/appstarter)

Which operating systems have you tested for this bug?

Linux

Which server did you use?

apache

Database

No response

What happened?

Using the URL Helper auto_link() function to automatically turn URLs contained in a string into links/HTML anchor elements. If the last character of the URL is a /, it is not included in either the anchor href value or text value.

Steps to Reproduce

Input:

$string = 'This is a test link to https://example.com/.';
$html = auto_link($string);
echo $html;

Output:

This is a test link to <a href="https://example.com">https://example.com</a>/.

Expected Output

This is a test link to <a href="https://example.com/">https://example.com/</a>.

Anything else?

I came across this as it seems that the default behaviour of Chrome and Firefox is to include the trailing slash when copying the URL from the browser's address bar. I tend to do this a lot and noticed that the auto_link() function always missed the trailing slash when creating anchor elements.

kenjis commented 2 months ago

158 The first commit of auto_link().

The regex is not changed from the beginning.

preg_match_all('#(\w*://|www\.)[^\s()<>;]+\w#i', $str, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER)

But the regex in CI3 is the following:

preg_match_all('#(\w*://|www\.)[a-z0-9]+(-+[a-z0-9]+)*(\.[a-z0-9]+(-+[a-z0-9]+)*)+(/([^\s()<>;]+\w)?/?)?#i', $str, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER)

It seems it was changed in https://github.com/bcit-ci/CodeIgniter/commit/8c9e51044d991868228dba8b9d5141998347dbfe. But before the change, there was already /? (not …+\w#i but …+\w/?#i) in the end.

corenominal commented 2 months ago

Thanks for looking into this. I messed up with my PR, apologies.

kenjis commented 2 months ago

No problem. It is almost impossible for anyone to send a perfect PR at first.

I sent #9169. Please check.

kenjis commented 2 months ago

@codeigniter4/core-team This is clearly documented, but why auto_link() recognizes a string starting with ://?

The link <a href="://codeigniter.com">://codeigniter.com</a> does not seem to work. If I put the link in the Welcome page, the URL will be http://localhost:8080/://codeigniter.com.

Note The only URLs recognized are those that start with www. or with ://. https://codeigniter4.github.io/CodeIgniter4/helpers/url_helper.html#auto_link

https://github.com/codeigniter4/CodeIgniter4/blob/ffd1b68b0d3fedd79bb26695d2e1e7203fab4af7/tests/system/Helpers/URLHelper/MiscUrlTest.php#L525-L526

michalsn commented 2 months ago

I guess that :// would support cases for: https, http, ftp, etc.

kenjis commented 2 months ago

Yes, 'test abc://www.codeigniter.com test' will be 'test <a href="abc://www.codeigniter.com">abc://www.codeigniter.com</a> test'. But it does not make sense.

michalsn commented 2 months ago

The range of what will "catch" on this rule is large, but thanks to this we have support for all direct links that are supported by many apps, like slack:// - https://api.slack.com/reference/deep-linking#client

neznaika0 commented 2 months ago

Telegram API https://core.telegram.org/api/links

kenjis commented 2 months ago

Okay, the current behavior seems useful, and can catch unknown schemes. But catching ://codeigniter.com does not make sense. Should it catch [a-z0-9]+://?

corenominal commented 2 months ago

RFC 3986 states that the scheme component is required.

The scheme and path components are required, though the path may be empty (no characters).

I would think that it should catch [a-z0-9]+://.

kenjis commented 1 month ago

scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) https://datatracker.ietf.org/doc/html/rfc3986#section-3.1

[a-z][a-z0-9+\-.]*:// is better?

michalsn commented 1 month ago

[a-z][a-z0-9+\-.]*:// is better?

Yes.

kenjis commented 1 month ago

I sent #9180