brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.93k stars 2.35k forks source link

Check if final debounce URL's hostname has at least eTLD + 1 #23580

Open ShivanKaul opened 2 years ago

ShivanKaul commented 2 years ago

For AMP URLs (and there will probably be others), there are cases where the embedded target URL in the original URL that we want to debounce does not have a scheme, for e.g. https://theguadian.ampproject.org/c/s/www.newsarticleonguardian.com -- www.newsarticleonguardian.com is not a valid URL (and hence not parsed as one at least in Chromium and also by iOS it looks like) because it does not have a scheme. We create the right debounce URL by using a new property called prepend_scheme : http | https as described in https://github.com/brave/brave-browser/issues/23121

The trouble is that if the original URL looks something like https://theguadian.ampproject.org/c/s/foo, and prepend_scheme is https, then we debounce to https://foo/, which is not actually a valid website, even though https://foo/ is technically a valid URL.

The proposed solution is (@fmarier):

  1. Once you have a valid URL, extract the hostname.
  2. Check that the hostname has at least an eTLD + 1 (e.g. example.com -> [example, com] :white_check_mark:; http://foo/ -> [foo] :x:).
  3. Reject the URL if the hostname doesn't have at least an eTLD + 1.

We should do this for all debounce actions.

ShivanKaul commented 2 years ago

@pilgrim-brave @pes10k

pilgrim-brave commented 2 years ago

Sure

pes10k commented 2 years ago

lgtm