boa-dev / boa

Boa is an embeddable and experimental Javascript engine written in Rust. Currently, it has support for some of the language.
MIT License
5.03k stars 398 forks source link

String interning #279

Closed HalidOdat closed 2 years ago

HalidOdat commented 4 years ago

Will there be a string interner for more affective memory use and string comparisons.

Like rustc does it.

HalidOdat commented 4 years ago

More specifically rustc interner.

Rustc Symbol doc:

An interned string.

Internally, a `Symbol` is implemented as an index, and all operations
(including hashing, equality, and ordering) operate on that index. The use
of `rustc_index::newtype_index!` means that `Option<Symbol>` only takes up 4 bytes
...

Which makes comparisons between symbols trivial, it is just a comparison between two u32.

The cost is when interning a string, since it requires to do a HashMap lookup. Getting the string from a symbol is trivial since it is only indexing into an array(Vec).

I found a implementation of this string-interner

jasonwilliams commented 4 years ago

Yep, this is interesting. The plan was always to move to something more efficient but start with strings first. (just to make it easier for implementation against the spec)

I had a conversation the with Spidermonkey devs about how we use strings for internal slots, but later want to move to something more faster. I think they do something like this.

I don't know very much about https://crates.io/crates/string-interner so this will be new to me, but im happy to let this issue be the conversation point.

jasonwilliams commented 4 years ago

https://matklad.github.io/2020/03/22/fast-simple-rust-interner.html

HalidOdat commented 4 years ago

Nice! This is pretty mush how rustc does it. and how string-interner crate is implemented.

Razican commented 4 years ago

I think that the string-interner crate is the way to go. It already provides a really nice, safe interner, and they are even discussing the optimisations laid out in the post you mentioned in https://github.com/Robbepop/string-interner/issues/16.

This could probably allow avoiding the use of String in TokenKind, which would make it Copy pretty easily, and reduce the memory footprint and improve the CPU cache usage.

Razican commented 4 years ago

I will give this a look, and maybe #336 too, at the same time.

HalidOdat commented 4 years ago

Hi @Razican

Correct me if I'm wrong!!!

String interning only has a good effect if the string that's it is interning is supposed to be immutable, like identifiers. I don't think string interning should be used in ValueData::String since it can mutate. For Example:

var x = "";
for(var i = 0; i < 100; i++) {
    x += "1"
}

Without string interning in ValueData::String, x will be one string in memory. With string interning in ValueData::String there will be in memory "", "1", "11", "111", etc. And these values will last till interner goes out of scope (pretty much the end of the program). So were in a way leaking memory (not really we still have ponters to the strings in memory).

Razican commented 4 years ago

String interning only has a good effect if the string that's it is interning is supposed to be immutable, like identifiers.

Yes, I was just trying a couple of things, but it does seem that way!

Razican commented 4 years ago

Interesting review of interners here. This could maybe co-exist with #565, or replace it.

With this review in mind, it's maybe worth taking a look at lasso too.

Razican commented 3 years ago

This might partially conflict with #736. We might need to create our own interner that allows using UTF-16 strings.

Razican commented 2 years ago

I think we can close this issue, as we are now using an interner in most of the engine.