golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.85k stars 17.51k forks source link

html/template: refuse to parse complex JS template literals #60055

Open rolandshoemaker opened 1 year ago

rolandshoemaker commented 1 year ago

Forking from #9200.

Our minimal JS state machine inside of the template parser is limited enough that we cannot reasonably understand the semantics of JS template literal syntax. Rather than attempting to bodge a interpreter for nested literals, i.e. as suggested in https://go.dev/cl/484075, we propose instead to simply refuse to parse complex literals, with the justification that we cannot reasonably determine the safety of these literals.

In terms of defining "complex", we would bail if we saw any special character (`,",',$,{,}) after encountering `${. For example this would prevent parsing templates containing `${"asd"}`, `${`${...}`}`, `${let a = {};}`, etc, while still supporting the majority of JS template use cases (including multi-line strings, and basic template actions, etc).

cc @golang/security

jupenur commented 1 year ago

Presumably the idea is that this would allow detecting the end of a ${ ... } placeholder inside a template literal simply by looking for the next } character after ${? If so, surely the suggested set of characters to disallow (`,",',$,{,}) is insufficient?

Consider these:

let a = `${
  // this is evil };-)
  expression + goes + here
}`;

let b = `${ /* this is equally evil };-) */ foo + 3 }`;

let c = `${/[^}]*/.exec(input)[0]}`;

Preventing misinterpretation in these cases would require blocklisting /, which would then also stop simple use cases like `${foo/2}` from working.

rolandshoemaker commented 1 year ago

I think the suggested implementation would be more complex than simply looking for the next }.

We would transition from stateJSBqStr to the stateJS state when encountering ${, and set a bit that we are actually inside of a string interpolation expression (this could, probably, be a completely separate state). The parser would then (mostly) properly handle javascript comments, regexp, strings, etc. If we attempted to transition to another state, other than reentering stateJSBqStr, we would consider that an error.

We would probably also need a new state variable, similar to jsCtx (which is used for what the next / means), which indicates what the next } means, either closing an object or the string interpolation, based on what characters the parser encounters.

For instance, an incredibly basic interpretation of the parser state would be:

`${ let a = {...} }`
  ▲         ▲   ▲ ▲
  │         │   │ │
  │         │   │ exit stateJS (unset interp bit)
  │         │   │
  │         │   unset jsCtxObject
  │         │
  │         set jsCtxObject
  │
  enter stateJS (set interp bit)
jupenur commented 1 year ago

I'm not sure I understand. First you mention bailing out when encountering one of a specific set of special characters (`,",',$,{,}), because these are the "complex" cases. And you list an example of when a bailout would happen: `${let a = {};}`. This sounds like a good idea in general.

But now you're saying you'd still do "proper" handling of JavaScript inside placeholders when no bailout happens, and then use `${ let a = {...} }` as an example of how parser state would evolve during parsing. This seems like a contradiction to the previous.

Why would the parser bail out on `${let a = {};}` but not on `${ let a = {...} }`? The only difference is the semicolon, which is not in your original list of disallowed characters. And if the goal is not to simplify how placeholder endings are detected, what is the goal?

rolandshoemaker commented 1 year ago

Oh bah, sorry, I managed to copy everything from my draft except the last paragraph which actually explained what I meant 🤦.

Most things we already consider "complex" have their own states, quoted strings, regexp, etc, so we can immediately fail on those transitions (perhaps allowing the comment states?). Javascript objects (and any other brace delimited syntax) on the other hand don't, so we'd need to look for when we hit a character than would cause a change into jsCtxObject, and fail (a more ambitious project would be to continue parsing here, and attempt to actually properly support nesting literals, complex template syntax, etc. It is perhaps worthwhile at least exploring how feasible this is, but I suspect it's a hole that appears shallow, but immediately drops a thousand feet).

In the above example then when we hit the first { inside of the interpolation context, we'd stop parsing and return some error, cannot safely parse complex javascript template literal, or such.

rsc commented 1 year ago

This definition of complex seems fine. I do think it's important to keep the simple cases working (passing through), even if we can't substitute into them ourselves.

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc commented 1 year ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

dwrz commented 1 year ago

While I understand the importance of this change, it has introduced a limitation that doesn't appear to have an easy work around.

If I want to assign a multi-line template to a JavaScript variable, is my only remaining option to create a function to transform newlines to ones compatible with '' and "" JavaScript strings (i.e, replace '\n' with '+' and '\')?

I relied on template literals to implement dynamic forms with Go templates, but am struggling to find an alternative solution.

{{ define "date-row" }}
<td>{{ .Start }}</td>
<td>{{ .End }}
{{ end }}
<table>
  {{ range .Dates }}
    {{ template "date-row" . }}
  {{ end }}
</table>
<script>
  const dateRow = `{{ template "date-row" . }}`
  const addDateRange = () => {
    const row = document.createElement('tr');
    row.innerHTML = dateRow;

    const dates = document.getElementById('dates');
    dates.appendChild(row);
  };
</script>
rolandshoemaker commented 1 year ago

Revisiting this, I spent some time prototyping a slightly more complex state tracking change in http://go.dev/cl/507995, this is essentially the "more ambitious project" described in https://github.com/golang/go/issues/60055#issuecomment-1540362027.

This prototype implements string interpolation expression depth tracking, and switches between the template string state and top-level JS state when entering and exiting template expressions (i.e. `string state ${ JS state }`). This would allow us to continue allowing complex JS template literal expressions (i.e. including ${}, while still disallowing Go template actions within the string portion of the template (i.e. allowing `example ${ {{. }} }`, and disallowing `{{. }}`).

Unless someone spots a particularly egregious error I've made in the implementation, I think this presents us with three possible paths to go down:

  1. Continue with the proposal as written here, disabling template actions when the expression depth > 0.
  2. Switch to the approach suggested in CL 507995, disabling template actions within the string portion of template literals, but allowing them within the expression portion (and inherently allowing unlimited nesting).
  3. Support actions anywhere within JS template literals, but escaping actions in the string portion, such that an action within the string portion cannot insert a expression. Actions within expressions would be treated the same as actions anywhere else in the top-level JS context. (This is the most permissive approach, which reverts some of the decisions we've made so far, but would support some use cases users have brought up.)
jupenur commented 1 year ago

With http://go.dev/cl/507995 this template parses incorrectly:

<script>
        var foo = `x` + "${ {{.}}";
</script>

Passing in ; alert(1); // as the input produces this:

<script>
        var foo = `x` + "${ "; alert(1); //"";
</script>
dwrz commented 1 year ago

I really appreciate the efforts to improve the implementation. Does the Go team have any recommendations for the use case where a template with actions needs to be stored as a multi-line JavaScript string?

jupenur commented 1 year ago

@dwrz you could use a script data block:

<script type="application/x-template" id="date-row-template">
  {{ template "date-row" . }}
</script>

<script>
  const dateRow = document.getElementById('date-row-template').innerHTML;
  const addDateRange = () => {
    const row = document.createElement('tr');
    row.innerHTML = dateRow;

    const dates = document.getElementById('dates');
    dates.appendChild(row);
  };
</script>
rolandshoemaker commented 1 year ago

@jupenur 🤦 there was a silly bug where state wasn't properly reset, should be working as intended now, let me know if you can see any other edge cases.

jupenur commented 1 year ago

@rolandshoemaker d01c3bf and this template:

<script>
        var foo = `${ (_ => { return "x" })() + "${ {{.}}" }`;
</script>

...with the input + alert(1) + produces this:

<script>
        var foo = `${ (_ => { return "x" })() + "${ " + alert(1) + "" }`;
</script>
rolandshoemaker commented 1 year ago

Ah right, we need to track brace depth across JS contexts. I feel like this is slowly draining my life force...

ianlancetaylor commented 1 year ago

Alternative in #61619.