getify / You-Dont-Know-JS

A book series on JavaScript. @YDKJS on twitter.
Other
177.9k stars 33.39k forks source link

Numeric property names remaning a number clarification #1791

Closed shneeki closed 2 years ago

shneeki commented 2 years ago

Yes, I promise I've read the Contributions Guidelines (please feel free to remove this line).


Please type "I already searched for this issue": I already searched for this issue.

Edition: (1st or 2nd) 2nd

Book Title:You Don't Know JS Yet: Objects & Classes

Chapter:Chapter 1: Object Foundations

Section Title:Property Names

Question:Is there a proof for the statement that "numeric (or "numeric looking") property "names" will remain a number? I can't find another source confirming that and checking this seems impossible as every method just coerces value to a string.

getify commented 2 years ago

The spec is the source of truth here, though you can prove it to yourself without digging through the spec if you want.

For starters: https://tc39.es/ecma262/#sec-ordinaryownpropertykeys

This section of the spec indicates that numeric indexes (numeric properties) are enumerated first, before string properties. The mere fact that it says this implies that they are in fact kept separate (and treated differently) according to the spec.

Try this code:

var x = { "2": "world", 1: "hello" };
Object.keys(x);
// ['1', '2']

Note how, contrary to the typical assumption that insertion-order would dictate that '2' would be listed before '1' (or 1), here we indeed see that '1' gets re-ordered in the Object.keys(..) output to be listed first, proving that it was not merely coerced to a string at the point of the object literal definition, but kept as a number and "sorted" numerically. More digging in the spec would reveal how the object literal properties are indeed handled according to their primitive type there.

But why then does '1' come out instead of 1?

As you can see in that createArrayFromList algorithm, it calls ToString(..) on all elements. Why? I dunno. But it does.


As to the statement that number-looking strings are treated as numbers... it's complicated, but... there's reasoning behind it nonetheless.

First, try this code:

var x = [0,1,2];
x["1"] = "hello";
x;   // [0, 'hello', 2]

As you can see, x["1"] was treated as if you did x[1]. But! We used an array there, duh. What about a regular object!? OK, try this (in Chrome's dev-tools as I'm trying it):

var x = { };
x[1] = "hello";
x["1"] = "world";
x;   // {1: 'world'}

Ooo, see how Chrome dev-tools represented the property as 1 instead of as '1'? Hmmm...

Let's try:

var x = { "1": "hello" };
x;   // {1: 'hello'}

Wow. So even when we try to define it first (and only) as a string property... if it can be treated as a number, it IS treated as a number.

Back to our earlier example involving enumeration ordering:

var x = { "foo": "bar", "2": "baz" };
Object.keys(x);   // [ '2', 'foo' ]
x;   // {2: 'baz', 'foo': 'bar'}

Seems pretty clear from this observation that no matter whether you try a property like "2" or like 2, it's going to be treated as 2. Again, this is all in the spec, but... you can see it for your own eyes right here.

Q.E.D.

sdegueldre commented 2 years ago
console.log(typeof Object.keys({1: a})[0]); // string

https://tc39.es/ecma262/#integer-index

An integer index is a String-valued property key that is a canonical numeric String (see 7.1.21) and whose numeric value is either +0𝔽 or a positive integral Number𝔽(2^53 - 1). An array index is an integer index whose numeric value i is in the range +0𝔽 ≤ i < 𝔽(2^32 - 1).

IMO the correct reading of the spec is not that "2" treated like 2, but that 2 is treated like "2" and that numeric strings are treated specially. I think it's misleading to write that they're coerced to numbers, they aren't: "2.0" coerces to 2 just fine but using "2.0" as a key keeps a key that's "2.0" and not 2. Non-Symbols, when used as key are coerced to string and then checked for numericity.

sdegueldre commented 2 years ago

as a side note, in the relevant section, you have

anotherObj = {
    // ...
    myObj: "<-- ...and so will this one"
};

Which ought to be

anotherObj = {
    // ...
    [myObj]: "<-- ...and so will this one"
};
getify commented 2 years ago

Is it your belief then that Chrome dev-tools is wrong in choosing to represent in its output that the property is 1 or 2 and should instead be representing it as '1' or '2', respectively?

I don't think it's an accident that chrome devtools is representing the numeric strings as numbers... it's because that's how they're treated. And indeed that might be how v8 literally stores them.

But whether they're actually stored that way in any given engine, or whether they're stored as strings but converted to numbers for the purposes of either the enumeration order or devtools output, I think we're off into rabbit hole debate about implementation.

So bringing things back to both the spec and to the interpretations and mental models people should draw from it, I guess I'd say that if acts as a number, it's a reasonable mental model to think of it as being a number. I don't see it as more advantageous to hold a more complicated mental model where strings keep being converted to be treated as numbers. I dunno why the spec seems to do that, but it doesn't seem useful to JS devs to reflect that.

getify commented 2 years ago

I think I do see a slight issue with how I've presented things currently, which I think is at least part of your point: that "treating as a number" and "number looking" actually should be "treating as an integer" and "integer looking". Indeed "2.0" looks like a number, but it's not treated as a number here; it does not however "look like an integer".

So anyway, I'll at least clean up that imprecise wording.

sdegueldre commented 2 years ago

The problem I have is that you wrote that strings are coerced to numbers and that is factually not the case nor does the spec imply that it should in any way, and it functionally does not behave that way in any browser that I know of, again, "2.0", when used as a key, stays "2.0" and a distinct key from "2". There is also no way to extract numeric keys from objects or arrays as numbers that I know of without converting them from string by hand.

Object.keys({"2.0": 1, 2: 1}); // ["2", "2.0"]

Is it your belief then that Chrome dev-tools is wrong in choosing to represent in its output that the property is 1 or 2 and should instead be representing it as '1' or '2', respectively?

The chrome dev-tools (and the ff dev-tools for that matter) do not wrap any of the keys in quotes when displaying them in the console (in ff: unless you would need quotes to write them in an object literal, such as when they contain spaces), and they do not use the same syntax highlighting as they do for numbers when displaying those keys.

"treating as a number" and "number looking" actually should be "treating as an integer" and "integer looking"

That's still incorrect as strings that represent numbers greater than or equal to 2^32-1 will be integer looking but not treated as array indices and ordered after:

Object.keys({"4294967295": 1, "4294967294": 1}); // [ "4294967294", "4294967295" ]

(side note, why the spec says less than 2^32-1 and not less than or equal to 2^32-1 is beyond me)

And again, there is no way to access those keys as numbers out of the box. For all intents and purposes, as far as JS in concerned, they are strings and are treated as such everywhere, they just have special behaviour for enumeration. I think that for all intents and purposes, thinking of objects keys as being "numbers or strings" is a less useful, more complicated, and less accurate mental model than thinking of keys being all strings.

getify commented 2 years ago

again, "2.0", when used as a key, stays "2.0"

Yes, and I already fixed that imprecision with switching to talking about "integer looking" strings instead of "number looking" strings.

you wrote that strings are coerced to numbers

Again, I already changed that in the above referenced commit... the wording is no longer "coerced to a number" but is "be treated as an integer property name".

that is factually not the case nor does the spec imply that it should in any way

Well, you and I disagree I guess on the weight that should be given to the fact that the spec does define special behavior for integer-indices during enumeration ordering. Moreover, as the code snippets I provided prove, a string like "2" gets treated the same as an actual integer 1 with respect to this enumeration ordering -- both get treated as integers, and integer indices are enumerated in numerical order rather than insertion order, AND they get all get enumerated before any string properties get enumerated.

In other words, I think there's a clear mental model implied there: integer indexes (as well as strings that look like integer indexes) are treated like they're numeric integers, not as if they're any ol' string property name.

I give that implication, and what I can observe in code confirming it, an immense amount of weight. It proves my point as far as I am concerned. You don't agree. That's fine.

There is also no way to extract numeric keys from objects or arrays as numbers that I know of without converting them from string by hand.

I furthermore don't care that much that the enumerations themselves normalize all property names to strings. I think that's a little weird, but I can see why having a normalized data-type in the output array or iteration might be helpful in some scenarios. So the fact that the spec says to call ToString(..) on the integer index elements before exposing them to our JS code, that doesn't bother me enough to persuade me to flip my mental model back: that JS is actually handling them all equally as strings. It's just not.

and it functionally does not behave that way in any browser that I know of

I've said already I don't think we should get too into the weeds about JS engine implementations, but... https://v8.dev/blog/fast-properties

That blog post does in fact confirm that v8 stores integer indexes (whether it's an array or general object) -- v8 calls them "elements" -- separately, as integers, from the string properties.

At issue is whether a property name in a object literal (or in [ ] brackets) that is the string "2" will be treated as an element (integer index) or as a string property. Again, the code snippets I shared above prove that "2" will be treated as 2 for the purposes of enumeration ordering. How it was physically stored in the engine is secondary (but v8 does it like I asserted)... what matters most is how the integers (or integer looking things) get sorted and enumerated before other strings. That's just incontrovertible fact.

That's still incorrect as strings that represent numbers greater than or equal to 2^32-1

OK, fair enough. But technically, JS does not use "integer" for integer-looking numbers greater than 2^32-1, it uses either "number" or, if you explicitly choose , "bigint". So my "integer looking" already, in a sense, includes this detail... to "look like" an integer, it has to be small enough to fit into JS's integer type (an internal spec type, not a language type exposed to code).

But that's a nuance that I don't think overrides my main point. I'd wager it's quite rare that people are using small AND large numbers as array integers, alongside other string properties, AND are still expecting those large integer-looking property names to get sorted with the smaller integer-looking property names. That's way in the weeds, and doesn't disprove my overall mental model.

Could I change "integer looking" to "integer looking (less than 2^32)"? Sure. But I think that would distract readers more than help them.


Taking a step back, in summary, where I think we really disagree is on which of these two mental models makes more sense (or is more complicated for readers/devs):

  1. MY MODEL At inbound time (defining/creating/setting a property on an array or object), if the property name given is actually an integer, or if it looks like an integer -- it's a string of digits 0-9 that can be coerced to a JS integer (less than 2^32) -- then JS actually treats it as an integer and dumps it into a different bucked of property names (v8 calls them elements) that are integer indices. When JS needs to sort property names for enumeration, it uses these two buckets of property names differently. When JS actually gives you those property names, it gives them to you as all strings, for type consistency.

  2. YOUR MODEL (as I think I understand it) At inbound time (defining/creating/setting a property on an array or object), all property names are treated as strings, regardless of what they look like, and thus they all get dumped in the same bucket. Later, when sorting property names for enumeration ordering, it looks at any string that looks like an integer and on-the-fly converts it to a number, for its sorting only... but otherwise keeps treating it as a string once it dumps the property names out.

I think your claim is that your model is both more true to the spec and simpler to understand. I think this model may be truer to the spec, in parts, but in other parts, not. I think the truth of the spec is more complicated in between the two models. But notably, I don't agree that model is simpler to understand/explain. And moreover, it happens to disagree with how v8 actually works.

In any case, if we've analyzed fully our disagreement, I'm happy to leave it at that. I appreciate your feedback even if ultimately I disagree with it. I think I improved the clarity of my text based on this feedback, and for that I'm thankful.

sdegueldre commented 2 years ago

Recall that any string property name on an object that "looks like" an integer -- is able to be validly coerced to a numeric integer

I maintain that there is no coercion going on, and also "numeric integer" makes no sense.

As for the rest of it, fair enough I suppose. Thanks for taking the time to evaluate the feedback and take it into account, unless @shneeki has any objections, as far as I'm concerned, this issue is resolved and may be closed.

getify commented 2 years ago

Is your objection literally to the word "coerced" where a phrase like "parsed" or "treated as" would lessen the concern?

Or are you objecting to the deeper assertion that strings which look like integers end up behaving as integers for the purposes of sorting?

And moreover, is the objection centered on when this difference in behavior happens (at the point of "input" or at the time of "output")?

getify commented 2 years ago

Appreciate the feedback in this thread. Going to close for now, but if there's further discussion to consider please feel free to speak up!