SerenityOS / serenity

The Serenity Operating System 🐞
https://serenityos.org
BSD 2-Clause "Simplified" License
30.65k stars 3.19k forks source link

LibJS: Allow index properties to have descriptors #2330

Closed mattco98 closed 4 years ago

mattco98 commented 4 years ago

Right now, any index property on an Object has no descriptors because they are stored in a separate Vector<Value>, and are not associated with any shapes. This leads to some odd behavior, such as the following:

// Serenity
> let o = { 1: 20 }
undefined
> Object.getOwnPropertyDescriptor(o, 1)
undefined
// Spidermonkey
> let o = { 1: 20 }
undefined
> Object.getOwnPropertyDescriptor(o, 1)
{ value: 20, writable: true, enumerable: true, configurable: true }

This problems also affects more than just that single Object method. Consider the following:

let o = {
  get 1() { return 10; }
}

Because getters and setters are always inserted into an object using put_own_property, the getter above will be inserted into the object's m_storage, not it's indexed m_elements. When accessing o[1], the object above will return undefined because o[1] will use get_by_index, which will not find the object in m_elements.

I know that the m_elements is an optimization thing, so I'm not quite sure what the best way to fix this problem is. At one point I thought just adding a m_is_indexed to Shape could fix it, but at that point we might as well just remove m_elements all together.

Another idea I had was to make m_elements a Vector<size_t>, where each element simply points to the index of a value stored in m_storage. This way, the indexed property would still cause and be affected by Shape transitions while still allowing array iteration to be cheap.

It's also very possible that this is how every JS engine works and I have no idea what I'm talking about :smiley:

Curious to get some feedback on this one.

awesomekling commented 4 years ago

Indeed, indexed properties can have uncommon attributes. In the vast majority of cases they won't though, so I think the best solution here would be having two modes:

  1. Common case fast-path. We store values in m_elements like we do now and they all have the same exact attributes.
  2. Uncommon case slow-path. We store values in an alternate data structure (instead of m_elements) that also has an attribute field for each entry.

In fact, we will need an alternate array storage format regardless, to handle sparse arrays where some madman inserts something at index 99999999. Currently, that will cause us to expand m_elements to fit that index, which is obviously not ideal!

Note that this should not require any shape transitions since shapes are only concerned with named properties.

alimpfard commented 4 years ago

get 1() { return 10; }

Dear lord...why? what's the use-case for such a thing?

mattco98 commented 4 years ago

Dear lord...why? what's the use-case for such a thing?

Likely for parity, since ({ get "a string"() { /* ... */ } }) and ({ get [myVar]() { /* ... */ } }) are valid, and at the very least, the computed property getters/setters definitely have clear use cases. I can't think of a direct use case for number/string literal getters/setters off the top of my head that wouldn't be better suited for a Proxy.

alimpfard commented 4 years ago

so essentially "because why not", just like quite a few other things in javascript. gotcha :)

mattco98 commented 4 years ago

So I've been playing around with this for a bit, and I'd like to share the design I have and get feedback on it before I continue any further.

The TL;DR is that the Object class will have two values for array storage:

struct ValueAndAttributes {
    Value value;
    u8 attributes;
};

Vector<Value> m_sequence_elements;
HashMap<u32, ValueAndAttributes> m_sparse_elements;

This design works nice, and I've wrapped those fields in another class with a nice iterator to make dealing with them easy, but with this method I have to do more HashMap lookups than I'd like. If I ever can't find a value in m_sequence_elements for small indexes, I have to do a HashMap lookup, which isn't great if the value isn't in there, especially if this is being done in a loop like an Array.forEach.

A thought I had was to place "markers" in m_sequence_elements if a value at that index exists in m_sparse_elements (this could be a new Value type, like Value::Type::IndexMarker or something). This way, if a small indexed property is being looked up that doesn't exist, I don't have to do a HashMap lookup if I don't see that marker value at that index.

For big values I'll always have to do that HashMap lookup, but I figure that's fine. I mostly just want to know if this is a good approach, and if so, if it's worth doing my marker idea. Let me know about any improvements you guys have, or if I should just scrap this and go a different way.

awesomekling commented 4 years ago

I would suggest something like this:

class Object {
    ...
    OwnPtr<IndexedPropertyStorage> m_indexed_property_storage;
};

Where an IndexedPropertyStorage is an abstract class like this:

class IndexedPropertyStorage {
public:
    // your access/iteration interfaces as pure virtuals here
};

class SimplifiedIndexedPropertyStorage {
public:
    // your access/iteration interfaces as overridden virtuals here
private:
    Vector<Value> m_elements;
};

class GenericIndexedPropertyStorage {
public:
    // your access/iteration interfaces as overridden virtuals here
private:
    Vector<ValueAndAttribute> m_packed_elements;
    HashMap<u32, ValueAndAttribute> m_sparse_elements;
};

In the generic variant, the "packed" elements are all the consecutive indexed elements starting from index 0. The "sparse" elements are all the elements outside the packed range, maybe with a little fudge factor beyond the .length to avoid unnecessary pessimizations.