BorisMoore / jsrender

A lightweight, powerful and highly extensible templating engine. In the browser or on Node.js, with or without jQuery.
http://www.jsviews.com
MIT License
2.67k stars 339 forks source link

unicode characters in template expressions #342

Closed johan-ohrn closed 5 years ago

johan-ohrn commented 5 years ago

We have some autogenerated code and for whatever reasons we have javascript objects with property names using non english characters such as åäö. When we want to print a property from such an object we would type something like this in our template: {{:myObj.someÅpropÄerty}}

This doesn't work because the javascript generated from this template expression is invalid. +((v=data.myObj.someÅdata.propÄdata.erty)!=null?v:"");

I managed to track down the problem to a couple of regexes using \w character class, which doesn't support anything but ascii characters.

I made the following changing in jsrender.js - we use the separate files Lines 54-57:

    uW = "\\w\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6",

    //rPath = /^(!*?)(?:null|true|false|\d[\d.]*|([\w$]+|\.|~([\w$]+)|#(view|([\w$]+))?)([\w$.^]*?)(?:[.[^]([\w$]+)\]?)?)$/g,
    rPath = new RegExp("^(!*?)(?:null|true|false|\\d[\\d.]*|([" + uW + "$]+|\\.|~([" + uW + "$]+)|#(view|([" + uW + "$]+))?)([" + uW + "$.^]*?)(?:[.[^]([" + uW + "$]+)\\]?)?)$", "g"),
    //        not                               object     helper    view  viewProperty pathTokens      leafToken

    //rParams = /(\()(?=\s*\()|(?:([([])\s*)?(?:(\^?)(~?[\w$.^]+)?\s*((\+\+|--)|\+|-|~(?![\w$_])|&&|\|\||===|!==|==|!=|<=|>=|[<>%*:?\/]|(=))\s*|(!*?(@)?[#~]?[\w$.^]+)([([])?)|(,\s*)|(\(?)\\?(?:(')|("))|(?:\s*(([)\]])(?=[.^]|\s*$|[^([])|[)\]])([([]?))|(\s+)/g,
    rParams = new RegExp("(\\()(?=\\s*\\()|(?:([([])\\s*)?(?:(\\^?)(~?[" + uW + "$.^]+)?\\s*((\\+\\+|--)|\\+|-|~(?![" + uW + "$_])|&&|\\|\\||===|!==|==|!=|<=|>=|[<>%*:?\\/]|(=))\\s*|(!*?(@)?[#~]?[" + uW + "$.^]+)([([])?)|(,\\s*)|(\\(?)\\\\?(?:(')|(\"))|(?:\\s*(([)\\]])(?=[.^]|\\s*$|[^([])|[)\\]])([([]?))|(\\s+)", "g"),

Seems kinda ugly to hard code the extra non-english characters like this. Perhaps you could make this user configurable or give unicode more generic support?

Even if using non-english characters for property names seem like a bad idea it's still valid javascript and as such I feel the same expression should be valid inside the template.

As a workarround I know we can type {{:myObj["someÅpropÄerty"]}} but this is ugly.

What's your take on the whole?

johan-ohrn commented 5 years ago

I forgot to mention this is on version 0.9.91 but the code is identical in version 1.0.0.

BorisMoore commented 5 years ago

It looks like providing generic support for unicode property names would be difficult without significant perf / file size cost, at least judging by this https://mothereff.in/js-variables#%E0%B2%A0_%E0%B2%A0, and the code behind it.

Providing extensibility is possible, so people can add specific unicode characters they want to support, like you did, via

$.views.settings.advanced({addJsNameChars: "\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6");

or similar.

But given the fact that you can do {{:myObj["someÅpropÄerty"]}}, as you say, I'm not sure about adding the setting... It still requires setting charactars manually, character by character...

johan-ohrn commented 5 years ago

I think a setting would be valuable for people in this situation. I agree that it's clumsy to have to individually add support for each character but at least it's a one time configuration.

In this particular case the offending property names are auto generated resource keys. We have a resource object like this:

resources: {
  för_lågt_antal_angivet: 'Entered quantity is to low',
  ...
}

In our view models we write code such as displayError(resources.för_lågt_antal_angivet)

and in our templates we do the same:

{{if quantity < 4}}
  {{:~resources.för_lågt_antal_angivet}}
{{/if}}

Having to use one syntax in the javascript files and an other in the template (sometimes) is unintuitive. Also developers are more comfortable with the dot notation instead of the index notation to access properties.

What's worse is when we need to change a resource. A developer would just copy/paste the new resource key in the template without considering that a property with non english characters need to be typed using the index notation. And because the change is so small it's also likely that it's not tested. It's just expected to work. In this contrived example there is an {{if}} surrounding the expression making it even more likely that this particular code path is not tested.

My 2 cents is that even though there is a viable workaround for the issue a setting would make this kind of code less error prone since you don't have to remember a special rule and more consistent with how you write code in the javascript viewmodel.

Also a setting like in the example you gave would allow for a small plugin/extension such as jsrender.latin1.js or whatever pre defined charset such that users wouldn't have to manually configure this.

BorisMoore commented 5 years ago

Yes, I am in two minds about doing this, given the test matrix, need to document etc. but FWIW the feature would be like this:

Additional code at line 164

var uW = value && value.addedJsNameChars;
if (uW) {
    uW = "[\\w" + uW + "$";
    rPath = new RegExp("^(!*?)(?:null|true|false|\\d[\\d.]*|(" + uW + "]+|\\.|~(" + uW + "]+)|#(view|(" + uW + "]+))?)(" + uW + ".^]*?)(?:[.[^](" + uW + "]+)\\]?)?)$", "g");
    rParams = new RegExp("(\\()(?=\\s*\\()|(?:([([])\\s*)?(?:(\\^?)(~?" + uW + ".^]+)?\\s*((\\+\\+|--)|\\+|-|~(?!" + uW + "_])|&&|\\|\\||===|!==|==|!=|<=|>=|[<>%*:?\\/]|(=))\\s*|(!*?(@)?[#~]?" + uW + ".^]+)([([])?)|(,\\s*)|(\\(?)\\\\?(?:(')|(\"))|(?:\\s*(([)\\]])(?=[.^]|\\s*$|[^([])|[)\\]])([([]?))|(\\s+)", "g");
}
return value

(Here it is: jsviews.js.txt)

Usage:

$.views.settings.advanced({addedJsNameChars: "\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6"});
var data = {foo.someÅpropÄerty:444};
...
johan-ohrn commented 5 years ago

Thanks. I'll try it out and will keep an eye out on this thread to see where it ends up.

BorisMoore commented 5 years ago

Ah, I had missed your comment above (the page had not been refreshed when I wrote my response above.)

Yes, I hear you... I'll let it mature a bit before deciding. Meantime, if you can test fully with the update I included above that would be helpful. Let me know if you see any issues/bugs.

BorisMoore commented 5 years ago

I have an alternative which I am looking at. I'll probably send you a new patch to test, in the coming days, rather than testing the version above...

johan-ohrn commented 5 years ago

Sounds good. In the meantime I use this quick fix.

BorisMoore commented 5 years ago

Here is an update: jsviews.js.txt

I managed to figure out a way of allowing all unicode chars, without needing to specifically specify the character codes. The default is not to support unicode chars in names, but you can write

$.views.settings.advanced({unicodeChars: true}); // support all unicode chars in JavaScript names

or else limit to specific ones:

$.views.settings.advanced({unicodeChars: "\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6"});

You can't have unicode characters on helpers, or helper properties, tag names or tag properties, etc. Only on properties of data passed in to the link() or render() method.

The generic unicode version, passing true, is done basically by replacing

/[\w]*/

by

/[^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]*/

so word characters are any characters other than the specified ones.

It adds 1% to the filesize (of jsrender.min.js), which I hesitate about, but at least the usage is easy now. (Assuming it works correctly!)

So if you are able to test it , it would be helpful, particularly to make sure it does indeed cover all unicode characters as expected (or at least test with a good range of characters...).

I am still considering whether or not to publish this feature...

johan-ohrn commented 5 years ago

I was able to create a test html page. Download here (sorry about the adds and wait time to download)

I extracted every character from the regex. The only character that appears to be invalid in fact in chrome is "ⸯ" which I have commented out in the html file.

I tested this with the jsviews version you provided. It throws an error.

I will test some more to see if I can isolate which characters are the culprit.

BorisMoore commented 5 years ago

Excellent. That is very helpful!

I did some testing with that, and all the characters seem to work. There was still an error from JsViews because of {{:aⸯa}} still being in the template markup. So I removed it.

Also, of course you need the line:

$.views.settings.advanced({unicodeChars: true});

before calling $.templates(...)

I also had some issues when using a file that did not have the BOM (with UTF-8 encoding).

But with those changes, the template rendered completely...

BorisMoore commented 5 years ago

Here is an updated jsviews.js and jsrender.js and a copy of my version of your test page:

jsviews.js.txt jsrender.js.txt testUnicode.html.txt

The test page works for me in Chrome and in Firefox, but not in IE, Edge, Safari... It seems that some of the unicode characters trigger javascript errors in those browsers. With a reduced set of unicode characters it works in all those browsers.

johan-ohrn commented 5 years ago

I was at the end of my shift and had been messing back and forth with this test page so I missed the obvious $.views.settings.advanced(...) call :)

Good that you made it work!

So we confirmed that the regex does indeed cover all unicode characters. The fact that it covers more than some browsers doesn't really matter I think.

Also since the regex does cover all unicode characters I don't see the need for a user to explicitly allow specific characters. Unicode on or off is sufficient. This should also result in a lower footprint for the feature.

If the option is to allow unicode or not then I see two possible implementations, well three. 1) Making unicode opt-in via $.views.settings.advanced({unicodeChars: true}); This alternative should ensure that no existing templates are broken by an unexpected bug in the new regex. Also adds more bytes to the final js file. 2) Somehow allow setting the full rPath and rParams regexes from outside jsviews.js. This would allow you to make unicode support opt-in from a jsviews-unicode.js file moving the overhead completely from jsviews.js itself. 3) Skip the option alltogether and just change the regex thus making unicode supported out of the box.

What's your take on these ideas?

BorisMoore commented 5 years ago

Yes, I agree with your analysis.

Option 2 is probably my preference.

Option 3 makes me nervous about possible bugs, especially doing it just after releasing v1.0.0 which is hopefully stable.

So option2 is a more conservative approach, given that this feature is unlikely to be used/needed by many users...

I'll post a version of approach 2 soon.

BorisMoore commented 5 years ago

Here is an update where you load the jsrender-unicode.js file after jsrender.js/jsviews.js, before calling $.templates(...) to compile any templates that have unicode characters in them.

Let me know if it works as expected. Thanks!

jsrender-unicode.js.txt jsrender.js.txt jsviews.js.txt

johan-ohrn commented 5 years ago

I gave it a try but run into some issues with jsrender-unicode.js

The regexes in jsrender-unicode.js is not the same as those we used here Is this intentional? Modifying jsrender-unicode.js to use those regexes seem to work (comments left out for abbrevity):

$.views.sub.rPath =
    /^(!*?)(?:null|true|false|\d[\d.]*|([^.^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]+|\.|~([\w$]+)|#(view|([\w$]+))?)([^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]*?)(?:[.[^]([^.^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]+)\]?)?)$/g;
$.views.sub.rPrm =
  /(\()(?=\s*\()|(?:([([])\s*)?(?:(\^?)(~?[\w$.^]+)?\s*((\+\+|--)|\+|-|~(?![\w$])|&&|\|\||===|!==|==|!=|<=|>=|[<>%*:?/]|(=))\s*|(!*?(@)?[#~]?[^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]+)([([])?)|(,\s*)|(\(?)\\?(?:(')|(\"))|(?:\s*(([)\]])(?=[.^]|\s*$|[^([])|[)\]])([([]?))|(\s+)/g;

I also found some left over code and double comments (line numbers from jsrender.js): line 114: // subscription, e.g. JsViews integration line 161:

                var unicodeChars = value && value.unicodeChars;
                if (unicodeChars) {
                    setPathRegEx(unicodeChars);
                }

You might want to double check any changes related to this issue.

BorisMoore commented 5 years ago

Ah - sorry, I must have accidentally uploaded an intermediate version. Thanks for pointing out the stale code/comments, too!

So hopefully this is correct now. The regEx expressions are close to yours, but not quite identical. Let me know if you have any concerns, or see any issues.

jsrender-unicode.js.txt testUnicode.html.txt jsviews.js.txt jsrender.js.txt

johan-ohrn commented 5 years ago

I finally got to implement the fix in our code. Works perfectly so far! I'll test it some more and see if anything turns up.

BorisMoore commented 5 years ago

I'll include this in the next update (v1.0.1)

johan-ohrn commented 5 years ago

Glad to hear. No problems so far.

BorisMoore commented 5 years ago

Fixed in commit v1.0.1