HTML, CSS, and JSON modules shouldn't solely rely on MIME type to change parsing behavior

rniwa commented 5 years ago

As we discussed in TPAC 2019 Web Components session, the current proposal / spec of HTML, CSS, and JSON modules do not specify the type of content in the import statement.

This is problematic because an import statement that intended to load CSS or JSON and not execute arbitrary scripts could end up executing scripts if the destination server's MIME type got changed or the destination server get compromised.

In general, we've made so that the importer of any content can specify how the imported content should be parsed & processed. This is one of the motivations for adding CORS fetch for JSON as opposed to JSONP for example.

rniwa commented 5 years ago

@annevk @dandclark @domenic

Jamesernator commented 5 years ago

I think there's two separate parts to this:

Treating some response as another/specific format
Preventing evaluation of unexpected formats

I personally think the second one should be part of Content-Security-Policy rather than changing anything about the current module loading e.g. Content-Security-Policy: modules self *, https://some.config.tld json, https://fonts.somewhere.tld css.

For the first one maybe we could extend the import: protocol to have import+css:/import+html/etc to force it into a specific format.

trotyl commented 5 years ago

This is problematic because an import statement that intended to load CSS or JSON and not execute arbitrary scripts could end up executing scripts if the destination server's MIME type got changed or the destination server get compromised.

I believe this functionality is required in polyfilling, to support browsers not yet supports HTML/JSON/CSS modules, a server can just respond the corresponding JavaScript file that wrapping original content to default export, which could provide a consistent code style at consumer-side.

rniwa commented 5 years ago

This is problematic because an import statement that intended to load CSS or JSON and not execute arbitrary scripts could end up executing scripts if the destination server's MIME type got changed or the destination server get compromised.

I believe this functionality is required in polyfilling, to support browsers not yet supports HTML/JSON/CSS modules, a server can just respond the corresponding JavaScript file that wrapping original content to default export, which could provide a consistent code style at consumer-side.

Weakening the security model for the sake of polyfilling is an unacceptable trade off in our view.

justinfagnani commented 5 years ago

Imports are inherently dangerous, and this is why CSP allows for restricting the origin for scripts. That should apply to all imports, regardless of type.

Importing JSON should only be done for trusted sources, and shouldn't in general be done for calling third party APIs (and arguably shouldn't be done for 1st party APIs if you don't want your module graph to fail to load if the API call fails). fetch() is much more appropriate for that.

Regarding server-side polyfills, this should still work if the client sends the appropriate Accept header.

rniwa commented 5 years ago

Imports as inherently dangerous, and this is why CSP allows for restricting the origin for scripts. That should apply to all imports, regardless of type.

Given that many websites don't use CSP correctly, relying on websites to correctly deploy CSP to get the right security behavior is not a great plan.

Furthermore, today, if you were to fetch JSON via XHR or fetch API and parse it via JSON.parse, there is no chance of the fetched JSON suddenly executing as scripts. This new module loading mechanism, therefore, is a functional regression from existing loading mechanisms.

We believe this security issue is a show stopper issue for HTML, CSS, and JSON modules.

annevk commented 5 years ago

Yeah, Mozilla does as well. Importing non-scripts should be safe by default. (If HTML modules end up executing script they might not necessarily be problematic.)

matthewp commented 5 years ago

What is the counter proposal?

dandclark commented 5 years ago

I think @justinfagnani's point here is interesting and I want to second it. Even aside from security concerns, JSON imports seem like the wrong tool for consuming non-first-party JSON/CSS. If the request 404's or if, say, the JSON has a parse error, the entire module graph will fail to instantiate/execute. fetch() is the right tool in this scenario, with import being reserved for content that is directly controlled by the importer. Otherwise all your module scripts could fail to run because of a missing { in third-party JSON...

annevk commented 5 years ago

That doesn't apply to import() afaik.

Jamesernator commented 5 years ago

If the request 404's or if, say, the JSON has a parse error, the entire module graph will fail to instantiate/execute.

This might actually be what you want in some cases, e.g. if a critical resource does fail to load then you don't want to waste resources evaluating a bunch of modules you don't need. You can always use import() to catch errors in that particularly and do some recovery code or retry.

rniwa commented 5 years ago

I think @justinfagnani's point here is interesting and I want to second it. Even aside from security concerns, JSON imports seem like the wrong tool for consuming non-first-party JSON/CSS.

Regardless of whether it's a good idea or not, some web developers are inevitably going to do it. We shouldn't be adding a new foot gun to the Web so that authors can avoid using it in certain situations to avoid creating a new security surface.

caridy commented 5 years ago

I can only second @rniwa here, @justinfagnani's point included a lot of "should/shouldn't", and we all know how that goes. Importing non-scripts must be safe by default.

littledan commented 5 years ago

Is it OK for WebAssembly modules to have parsing behavior based on their MIME type? That's what the current proposal does.

joeldenning commented 5 years ago

HTML, CSS, and JSON modules do not specify the type of content in the import statement.

Are there any proposals / ideas that could solve these issues?

At the risk of suggesting something naive, could the file extension be the way that the import statement specifies its expected module type?

 // both the file extension and mime type must be present for the non-script module to load
import 'file.json'
import 'file.css'
import 'file.html

The obvious limitation of this approach is that you can't load json, css, and html modules from urls without the proper file extension. The advantage is that it doesn't require a rework of import statements and import maps to make html, css, and json modules work.

annevk commented 5 years ago

Apart from plugins there's no precedent for putting meaning in file extensions within the web, I don't think we should start here.

rniwa commented 5 years ago

Is it OK for WebAssembly modules to have parsing behavior based on their MIME type? That's what the current proposal does.

That’s not ideal but less problematic because the expectation of loading WASM & JS are similar: they execute arbitrary scripts. It’s not so with CSS & JSON.

littledan commented 5 years ago

If we had syntax in JavaScript for asserting the MIME type (mandatory for JSON modules, and optional for JavaScript), would that address this concern? If so, we can look into this issue in TC39; I don't think we have considered it before. As a strawperson, it could look like this:

import document from "./foo.json" with mime: "application/json";

This could be a basis for adding sending metadata to the host environment (HTML, Node.js, etc). Thoughts? Did people have other specific ideas for what this should look like?

annevk commented 5 years ago

Nothing concrete was discussed, but yeah, something like that would address the concern. And equivalent for import(). Instead of the MIME type it might be nicer to assert it being "json" or "css" or some such.

ljharb commented 5 years ago

The author is the only true arbiter of the parse goal of content; would this be an assertion, or would it change the parsing like script type does, for example?

littledan commented 5 years ago

@ljharb I imagine we would continue to use MIME type in conjunction with the declaration within JS. That's why I used the word "assertion".

devsnek commented 5 years ago

how about the CSP idea but the default is disallow instead of allow? I think the layering of extending import syntax is a bit iffy, since it adds host-specific semantics to an otherwise generic bit of code, and then random hosts have to be like "wait what do i do with this"

annevk commented 5 years ago

It doesn't really seem host-specific to be able to import JSON without that resulting in script execution later on.

devsnek commented 5 years ago

as an example, node requires you to write .json in the import specifier. we would have no use for such a syntactic extension. jumping a bit further, if someone writes import x as 'application/json', but a host doesn't use mine types, what do they do with that?

littledan commented 5 years ago

Right, somehow, this is syntactically redundant in environments where the interpretation is implied by the module specifier's suffix already. The web doesn't have a tradition of making such judgements.

I suppose the analogous (and unprecedented?) thing here would be something like requiring that, if the MIME type is application/json, then the module specifier must end in .json, and prohibiting JS modules with this suffix. But such a scheme faces web compatibility issues with growing over time.

If we require this syntax for JSON modules on the web, I think there is some chance that a common authoring format will omit the assertions, and tools will insert it when generating web output as part of a build process. But there is also a chance that we can convince most people to write this directly.

devsnek commented 5 years ago

sorry, to be more direct with what i'm trying to say: given security is host specific, shouldn't this assertion be out-of-band from the import? at worst you would have hosts ignoring a check people expected would be enforced.

annevk commented 5 years ago

Inferring meaning from a file extension is incompatible with the web's architecture. Requiring out-of-band annotations seems like it would create such bad ergonomics the feature would effectively not be used on the web.

devsnek commented 5 years ago

I'm not necessarily suggesting using file extensions. Also, a lot of things on the web are out of band (csp in headers, import mapping in a html tag, etc). I'm not sure why this specifically would be worse. If you're worried about people not using it, you can default to disallow, as I keep saying.

annevk commented 5 years ago

I was assuming it was a given that it defaults to disallow, but that also makes it a pain to use the feature (having to set up CSP correctly) compared to fetching and parsing as JSON.

devsnek commented 5 years ago

sure, but turning loading a module into a support matrix is a pain for everyone else. In node we've done everything out of band from import because of this, so I am hopeful the web can do the same.

joeldenning commented 5 years ago

Out of curiosity, were those at the TPAC 2019 General Session wanting to work through the security issues to see HTML, CSS, and JSON modules implemented?

Or was the feeling more that this was a showstopper that completely blocks those module specifications from going forward?

The responses in this github thread seem to shut down proposed solutions, without proposing an alternative solution. Which makes me think maybe those at the TPAC 2019 session are leaning towards shutting down these specifications entirely?

rniwa commented 5 years ago

Or was the feeling more that this was a showstopper that completely blocks those module specifications from going forward?

For WebKit, this is an absolute show stopper. We don't think we should ever implement the support for HTML, CSS, or JSON module without this issue being addressed adequately.

The responses in this github thread seem to shut down proposed solutions, without proposing an alternative solution. Which makes me think maybe those at the TPAC 2019 session are leaning towards shutting down these specifications entirely?

There are many standards proposals that are shutdown due to privacy, security, and other considerations. It's usually the burden of those who are seeking to propose or add a new standards to come up with an appropriate solution, not of those who pointed out issues with a proposal.

justinfagnani commented 5 years ago

So if extensions and out-of-band solutions are out, I think that leaves us with two basic approaches:

Non-extension, in-specifier syntax.
JavaScript syntax

Does this sound right?

I heard that in-specifier syntax might have been discussed already. Can anyone shed some light on that? Was it something like

import styles from 'import+css:./styles.css';

For JavaScript syntax options, I presume this interacts with other requests over time, like SRI hashes for imports. Has there been a proposal for generic metadata options to be added to import directives?

@littledan

If we require this syntax for JSON modules on the web, I think there is some chance that a common authoring format will omit the assertions, and tools will insert it when generating web output as part of a build process. But there is also a chance that we can convince most people to write this directly.

Build steps happen at multiple points in the edit-to-production pipeline. A build step could insert these assertions to libraries before publishing to npm, for instance, which would be pretty invisible to package consumers.

littledan commented 5 years ago

For JavaScript syntax options, I presume this interacts with other requests over time, like SRI hashes for imports. Has there been a proposal for generic metadata options to be added to import directives?

Not that I know of. If someone is interested in championing this in TC39, I would be very happy to help them through the process.

Build steps happen at multiple points in the edit-to-production pipeline. A build step could insert these assertions to libraries before publishing to npm, for instance, which would be pretty invisible to package consumers.

No disagreement with you there. I don't have any particular value judgements on this potential end state.

annevk commented 5 years ago

I think the options are:

A different identifier.
Dedicated JavaScript syntax.
Mandatory for non-JavaScript/Wasm modules out-of-band annotation (via CSP or Import Maps or some such). (If HTML ends up with the ability to execute script we might not have to require an annotation there either.)

bahrus commented 5 years ago

Does the full mime type really need to be specified? It would nice to provide notation that also could, in the future, support specifying inline data within a js file, just as html can do with js / css references.

I liked the idea proposed here:

import styles from './styles.css' as StyleSheet;

Later, once this is implemented, a follow-up proposal would be to allow inline text, which the browser could optimize around, knowing what format to expect:

import styleSheet from string `
div { color: green; }
` as StyleSheet;

Or was that proposal found problematic?

justinfagnani commented 5 years ago

@annevk @rniwa was encoding the module type in the scheme discussed at TPAC? Is there a critical flaw in that approach that eliminates the option?

annevk commented 5 years ago

@justinfagnani that sounds like 1 above (a different identifier). See also https://github.com/w3c/webcomponents/issues/839#issuecomment-536200346. To reiterate, the security concern was raised in the meeting, there's no concrete alternative proposed, but as mentioned above there's a couple of potential paths.

littledan commented 5 years ago

Just to clarify, in TC39, we usually use the term "module specifier" for what I think @annevk is calling "identifier" in this thread.

devongovett commented 5 years ago

FWIW, we are also having this same discussion about "import as type" in bundler land right now, specifically in Parcel. We don't have the same issues with security being discussed here, but do have a related issue where imported files could have multiple compiled representations that a user would like to choose from. See https://github.com/parcel-bundler/parcel/issues/3477. It would be nice to land somewhere standard syntax-wise across both bundlers and web browsers.

Our current thinking, without changing JavaScript syntax, is to add something in the module specifier (likely a protocol) to signal the type to import. I see that this is also being proposed here. I would also be happy with an extension to the import syntax for this, but I'd imagine that would be harder to get standardized (especially the list of supported module types).

caridy commented 5 years ago

About new "javascript syntax" for the module specifier, we had many discussions at tc39 about the module specifier, first for importer metadata, then for builtin modules, and few more during the last 5 years or so. Don't keep high hopes on changing that, the feedback has been consistent: why does ES need to know about this? isn't the flexibility on the module specifier sufficient?

annevk commented 5 years ago

Well, there's less flexibility now hosts have decided on a default behavior. And the option of using URL schemes for this seems rather ungainly.

littledan commented 5 years ago

I plan to raise this at TC39 in the December meeting, and hope to have a draft to share with this group some time in November. I agree that adding more syntax into the module specifier would be unfortunate, given the complexity of URLs as is.

ljharb commented 5 years ago

I’m not sure how that’s truly avoidable though, given the web’s insistence that the consumer (instead of the author) should dictate the parse goal/format.

GeoffreyBooth commented 5 years ago

I’m not sure how that’s truly avoidable though, given the web’s insistence that the consumer (instead of the author) should dictate the parse goal/format.

“Insistence” is a pretty judgmental word. The Web also insists that the consumer knows the URL of the resource that the consumer is trying to load, as well as the exported symbols (i.e. all the non-keywords in (import { shuffle } from 'https://foo.com/utils.js';). Knowing the type as well isn’t really such a huge imposition.

ljharb commented 5 years ago

then it seems like putting it in the specifier wouldn’t be such a huge imposition either.

devsnek commented 5 years ago

HTML, CSS, and JSON all have common usage extensions in IANA, could we just use those explicitly? e.g. importing JSON has to use .json and importing HTML has to use .html or .htm, and then the browser fails the resolve if the types don't match up.

shannon commented 5 years ago

Just to provide a use case for JSON in particular not having a .json extension would be REST APIs. I imagine there would be a good use case for fetching JSON in SPAs.

I’m not sure how that’s truly avoidable though, given the web’s insistence that the consumer (instead of the author) should dictate the parse goal/format.

I believe the main issue described here is that there is often a middle layer here. Where a request may be served by a third party. Both the consumer and the author should be able to specify which parse goal they intend.

FWIW as an outside observer, a URL scheme seems appropriate to me. It's very similar to tooling such as webpack using json-loader!./file.json.

annevk commented 5 years ago

From upthread:

Inferring meaning from a file extension is incompatible with the web's architecture.

devsnek commented 5 years ago

just to clarify, i'm saying to use it as an assertion, not as the determinant. in any case i think loading data from apis is motivating enough to throw the idea out.

WICG / webcomponents

HTML, CSS, and JSON modules shouldn't solely rely on MIME type to change parsing behavior #839