Support Simple JS Escaping

kuza55 commented 8 years ago

You guys have done some excellent work, but I'm curious why you don't handle "simple" JS escaping where the template value is in a string?

Fully tokenizing JS is obviously a lot of work; but to get to being able to escape values inside String it seems like you only need to track comments & quotes.

Am I missing something, or are there other reasons you guys haven't approached this problem?

neraliu commented 8 years ago

it is a good question, right now, we have the logic to identify the template markup is being placed within script tag, however, we think that most of the time, the scriptable context is DYNAMIC and it is NOT the final output context, such that even we apply the filtering rule to the template markup, it is still vulnerable

taking the following example, the final output context is attribute single quoted and URI contexts, not scriptable context, attacker can easily bypass with "javascript:" in this case, if we apply the JS filtering, we will give the false sense of security to the developer, so we take the approach of warning the developer instead right now.

<script>
var input = "{{handlebars-template-markup}}";
document.html("<a href='" + input + "'>link</a>");
</script>

PS. the logic to detect template markup in scriptable context https://github.com/yahoo/secure-handlebars/blob/master/src/context-parser-handlebars.js#L761 https://github.com/yahoo/secure-handlebars/blob/master/src/context-parser-handlebars.js#L640

neraliu commented 8 years ago

I am curious what exact use case u want to support? and then we can brainstorm what is the best approach.

kuza55 commented 8 years ago

I don't have a specific use case in mind, I just think this is really useful tech that I would want to recommend to people building webapps. Frankly, I think these kinds of systems should be the default for templating in every language/framework.

You're right that there are probably a lot of apps that look exactly how you mention, but the approach you've taken of html encoding things in that context is one that other templating systems have taken before (in the form of static auto-escaping), and the result has been people just removing the escaping entirely rather than figure out how to properly escape the data. Or in the event handler scenario, they just assume that since the html escaping isn't breaking everything it's fine.

I haven't actually looked at what the warnings you have are, but the solution I saw in the slides about this talk for making it to go away involve reading the value out of a HTML attribute, which is fine, but doesn't really solve the issue of JS being dynamic.

I just feel like not everyone has the capability to review JS code for security, or even has access to a security person, but we should still make their lives incrementally easier when we can.

You can still keep emitting the warning if you like, but I feel getting the escaping right by default would be a better than getting it half-right sometimes (event handlers in HTML) and emitting a warning. I get so much output on my command line when I build my code that I may miss warnings entirely.

neraliu commented 8 years ago

I totally agree to your point that escaping right by default is the goal.

The workaround we are suggesting right now is to ask the developer to add the filter to specify the final output context. frankly, this workaround defeats our initial intention of analyzing context and applying the filter right automatically. it is manual approach. :( https://github.com/yahoo/secure-handlebars#warnings-and-workarounds

Recently, I have seen the trend of JS templating is evolving to another approach of transforming the template into the DOM before data binding, the main reason of this approach is have more responsive UI. interestingly, the by product of this transformation can limit the output context by its templating language, not similar to handlebars that developers can put output markup without limitation like the example above. in this case, our life is easier of just applying the filters correctly and automatically. Of course, I assume that their template parsers are correctly coded.

If we wanna to solve the JS context in handlebars, we need to solve one interesting challenge, what parser should we use for parsing the code within the <script> tag? please bear in mind that the code within the script tag is a mix of handlebars markup and JS code, the JS parser cannot parse it correctly. if we break down the handlebars markup and JS code into different data chunk, then we need to have a JS parser that can parse JS code chunk. it is a headache to me. :p

We can keep this issue open till we figure out the right solution. let me know if u have any ideas

kuza55 commented 8 years ago

So, I'm not really up on trends in JS templating, but here are some thoughts:

If the type is not application/javascript or missing then do what is done now. (To deal with cases such as text/template; maybe text/template should be special-cased somehow, I'm not sure what would be most intuitive)

If it is determined to be actually JS, then maintain a state machine that can transition between the states: General JS String (') String (") JS templating string Multi-line comment Single line comment Regex literal

I think this is all of the meaningful states that need to be tracked as per https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar

The transitions seem pretty straightforward, except for the Regex literal. I'm not sure how you differentiate between a regex and division without knowing more about JS. If there's no good heuristic then it might be a lot harder than I expected, though maybe there's a way to simply emit warnings in that case.

In any case, once you figure out which of those states you're in you can use the appropriate escaper, or provider a sterner warning.

YahooArchive / secure-handlebars

Support Simple JS Escaping #140