Closed ericf closed 10 years ago
In my opinion, legit tokens (${key}
or {key}
) should throw an error and halt operation. Garbage tokens (${k ey}
or ${k\ney}
) should be left in the message as is because there is no legit way rule out an actual ${ }
use in the pattern outside of a token to identify a garbage token.
I think we should start by researching how this is handled elsewhere, in other programming languages, or if there's a ICU spec. that explains how these ought to be handled.
Once we have all the data, then I think we can form a better opinion about what to do in this package when these exceptional cases are encountered.
on the php MessageFormat is an exact match. if you have: "Welcome to {CITY}, {STATE}"
then you need to pass: array("CITY"=>"Sunnyvale", "STATE"=>"California");
having: "Welcome to {City}, {STA TE}" will FAIL, it returns FALSE
fails because you are passing "CITY" instead of "City", and "STATE" instead of "STA TE"
But is {STA TE} considered a valid token to be matched?
no it is not considered, needs to be an exact match.
No no, I mean does {STA TE} create a token to be looked for in the object. That being:
var msg = new IntlMessageFormat("${STA TE}");
var m = msg.format({ "STA TE": "California" });
does m
look like "California"
or "${STA TE}"
.
If STA TE
is a valid token, it should return "California". If it is not a valid token it is, it should return "${STA TE}".
I do not feel a token with white space should be considered a valid token. That's my 2¢.
token with spaces are not considered valid in the php implementation.
Based on what I could find:
C/C++/Java all use an Array of items with tokens as {0, type, options}
where the first option is the index in the Array. If the index isn't in the array (token is not found) the application fails, either gracefully or not.
PHP uses a similar approach but returns false
.
PHP also doesn't not identify with the associative array aspect (where the token is the key for the value in the array) as we are doing with Objects in JavaScript ({token: value}
). There is however a bug reported (and a fixed merged in) https://bugs.php.net/bug.php?id=61871 that allows for associative arrays to be used. This too returns false
when a token is either invalid or not found.
With taking the associative array/keyed array/hash table/object approach, providing users the ability to have complex tokens would not be worth it as it would greatly diverge from indexed array approach.
That being said:
do you mean that if we have: {STA1TE} is the biggest state of USA.
won't work? in php it works. but I don't mind limiting to [a-zA-Z] if that is a Javascript limitation.
for the unmatched I will say return false.
There are two different states: constructor and format
constructor
var msgA = new IntlMessageFormat("{STATE}");
// returns new object
// msgA.pattern === [{ valueName: 'STATE' }]
var msgB = new IntlMessageFormat("{ST ATE}");
// returns new object
// msgB.pattern === ["{ST ATE}"]
var msgC = new IntlMessageFormat("{ST1ATE}");
// returns new object
// msgC.pattern === [{ valueName: 'ST1ATE' }]
Given these objects...
format()
// msgA
msgA.format(); // throws error
msgA.format({ FOO: 'bar' }); // throws error
msgA.format({ STATE: 'Missouri' }); // "Missouri"
msgA.format({ "ST ATE: 'Missouri' }); // throws error
msgA.format({ ST1ATE: 'Missouri' }); // throws error
// msgB
msgB.format(); // throws error
msgB.format({ FOO: 'bar' }); // "{ST ATE}"
msgB.format({ STATE: 'Missouri' }); // "{ST ATE}"
msgB.format({ "ST ATE: 'Missouri' }); // "{ST ATE}"
msgB.format({ ST1ATE: 'Missouri' }); // "{ST ATE}"
// msgC
msgC.format(); // throws error
msgC.format({ FOO: 'bar' }); // throws error
msgC.format({ STATE: 'Missouri' }); // throws error
msgC.format({ "ST ATE: 'Missouri' }); // throws error
msgC.format({ ST1ATE: 'Missouri' }); // "Missouri"
The same would be the result regardless of the existence of the $
Does this sound right? Is there any other case that should be explored?
msgA
msgB and msgC invalid since the constructor. msgA is correct in the constructor and the rest of the examples on the format are throwing what is expected.
msgB and msgC should be... false? Since the constructor should return false? Or should throw an error?
Are we saying that any non [a-zA-Z] character between {
and }
should fail the message format creation?
throw error, my mistake.
we have a lot of cases where we have {0} is the capital of {1} which is the biggest state of {2}
so from my point of view we should support [a-zA-Z0-9_-] even php supports STA$TE , it does not support ! and other ones, so we can limit the support to only those.
I'm good with that. I'll add tests and adjust the code accordingly. After the pull request is issued, I'll link to this for comments to make sure they are captured.
When processing tokens, what happens to unmatched ones? Also what happens to tokens like
${foo bar}
(garbage)?