Closed ronosm closed 2 years ago
+1
here:
https://github.com/apostrophecms/sanitize-html/blob/3cdc262/index.js#L333-L338
https://
resource from http://
origin creates CORS issues.while new URL('//my.url')
breaks without protocol, i don't think it should be up to sanitize-html
to enforce the scheme of URLs when no allowedScriptHostnames
is defined?
i.e. const parsed = new URL(value);
should move inside the if
block?
https://github.com/apostrophecms/sanitize-html/blob/3cdc262/index.js#L340-L348
edit: that probably applies to iframe src
too
tag @yorickgirard
Is this happening in the browser only? I believe the WHATWG URL parser in nodejs supports it.
A small subclass of the URL class to work around this issue probably wouldn't be difficult to contribute as a PR. The idea being to stub in the https: protocol but then stub it out again in toString if it was stubbed in.
(As a PR on this module that is. Modifying URL upstream is unrealistic of course.)
@boutell , I guess what I'm saying is that in the absence of explicit allow-list of domains, it shouldn't even try to police URLs.
$ node
Welcome to Node.js v16.14.0.
Type ".help" for more information.
> new URL('//google.com')
Uncaught TypeError [ERR_INVALID_URL]: Invalid URL
at __node_internal_captureLargerStackTrace (node:internal/errors:464:5)
at new NodeError (node:internal/errors:371:5)
at onParseError (node:internal/url:552:9)
at new URL (node:internal/url:628:5) {
input: '//google.com',
code: 'ERR_INVALID_URL'
}
> new URL('ftp://google.com');
URL {
href: 'ftp://google.com/',
origin: 'ftp://google.com',
protocol: 'ftp:',
username: '',
password: '',
host: 'google.com',
hostname: 'google.com',
port: '',
pathname: '/',
search: '',
searchParams: URLSearchParams {},
hash: ''
}
>
I don't think it would be a good idea to accept invalid URLs, but I agree that protocol relative URLs should not be considered invalid, at least by default, at least not yet. This is why I'm suggesting using a subclass wrapper for URL
that accepts that particular case without reimplementing URL
.
(The lack of support for protocol relative URLs in Node 16 does increase the priority on our end but it's not something we can immediately schedule here as we don't have a client requirement for it right now.)
On Tue, Feb 22, 2022 at 9:17 AM Georgi Marinov @.***> wrote:
@boutell https://github.com/boutell , I guess what I'm saying is that in the absence of explicit allow-list of domains, it shouldn't even try to police URLs.
$ node Welcome to Node.js v16.14.0. Type ".help" for more information.
new URL('//google.com') Uncaught TypeError [ERR_INVALID_URL]: Invalid URL at __node_internal_captureLargerStackTrace (node:internal/errors:464:5) at new NodeError (node:internal/errors:371:5) at onParseError (node:internal/url:552:9) at new URL (node:internal/url:628:5) { input: '//google.com', code: 'ERR_INVALID_URL' }
new URL('ftp://google.com'); URL { href: 'ftp://google.com/', origin: 'ftp://google.com', protocol: 'ftp:', username: '', password: '', host: 'google.com', hostname: 'google.com', port: '', pathname: '/', search: '', searchParams: URLSearchParams {}, hash: '' }
— Reply to this email directly, view it on GitHub https://github.com/apostrophecms/sanitize-html/issues/531#issuecomment-1047842026, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH27P7462HKMFC27FIE4TU4OLH5ANCNFSM5M57AQYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
--
THOMAS BOUTELL | CHIEF TECHNOLOGY OFFICER APOSTROPHECMS | apostrophecms.com | he/him/his
it's just that till a few updates ago, sanitize-html didn't have this issue, right?
and in the browser, where the sanitized html ends up, protocol-relative URLs are not invalid.
src
has always been passed through naughtyHref
no matter what the tag is. I think what we're seeing is that Node 16 now has the same strict policy on protocol relative URLs that is enforced by Safari. A more tolerant subclass of Url would resolve it.
It's not possible to add a protocol-relative URL in the src attribute from script tag.
At naughtyHref function it is handled, but after that this code is executed:
const parsed = new URL(value);
This code generates an exception when a protocol-relative URL is used, generating an exception making allowed value as false:
} catch (e) { allowed = false; }
Finally, it makes the src attribute is removed from script tag.
This happens between lines 325 and 355.