denoland / rusty_v8

Rust bindings for the V8 JavaScript engine
https://crates.io/crates/v8
MIT License
3.38k stars 315 forks source link

Managed external strings #1284

Open futscdav opened 1 year ago

futscdav commented 1 year ago

This issue is more of a question / feature request than anything.

I'm looking at how external strings are handled currently - static only, which is a limitation. There is clearly a good reason for it if the resulting API is to be simple. However, I'm now at a point where having the ability to create managed external strings could potentially be a large improvement.

Couple of words on the use case I ran into: I'm sending string values between JS and C# code, which coincidentally also uses UTF16 representation for its strings and has the ability to extend data lifetime. With the current state, even if we make use of this fact and use twobyte constructors for v8 Strings, this will clearly mean copying the buffer without any option of sharing data.

As I understand it (the documentation is fairly light on details), v8 API makes it possible to implement the external resource in such a way that the notification of when the v8 GC determines it no longer needs the data could be relayed back by hooking into the Dispose method. This gets into territory that doesn't easily map onto the Rust side, and would need some careful design in order to work seamlessly. I realize this is a very niche use-case, probably not worth the additional maintenance burden, however, which is why I present this as a question before even trying to propose any solutions or PRs.

Can anyone see any elegant solutions in order to make this possible?

aapoalas commented 1 year ago

Hello! Just last week I implemented a fully build-time static one-byte "class" and have used the C++ API elsewhere to do managed strings.

Since we already have the precedent of my last week's PR, the basic idea of creating a C++ "class" inheritance on Rust side is one that can be entertained. One would probably then want to implement this class in such a way that it binds to a Rust trait so we get a more familiar use API.

The one problem that we'll run into is getting the external string data back out again in a managed way. For example imagine that your managed class internally holds a reference counted string coming from C#. V8 offers a way to ask if a string is external and a way to get the external string resource pointer out of the string. But now how to get the reference counted string? Do we just transmute the resource pointer into our custom class? Can we actually trust that this wasn't a static external string resource? Because if it is then the transmute will be invalid and a segfault will likely follow.

The way I had to solve this elsewhere was to simply keep a set of managed resource pointers that I'd created: If I found the resource pointer in the set then I could safely transmute (static cast actually since this was C++) it.

futscdav commented 1 year ago

Hi, I saw the the PRs, really good work that moves things forward!

You raise an interesting point, my initial thought was to say that once an external string is presented to the API, there should be no way to interact with it through the API other than processing it once it's freed by the GC.

However, I can see the potential uses of getting that pointer out again, so that is a feature on its own too. Even in the C++ API you basically have two options: 1) as you said, keep a mapping of your resources or 2) use RTTI of ExternalStringResourceBase. I'm not sure what the implications of relying on RTTI here would be or if the code is even compiled with RTTI, but it would be a cheaper check than a map lookup, especially considering v8 basically already requires the class to have virtual functions.

mmastrac commented 1 year ago

I would be entirely fine with an API that "swallows" the string as a first pass. For Deno, we could definitely use a way to hand off one-byte or two-byte strings in a cheap way.

We're going to experiment with using @aapoalas's work in Deno in a small experiment to prove that the concept works and is safe cross-platform and I think we can absolutely explore opening the API up in the meantime.

aapoalas commented 1 year ago

You raise an interesting point, my initial thought was to say that once an external string is presented to the API, there should be no way to interact with it through the API other than processing it once it's freed by the GC.

As mentioned, I've elsewhere implemented a C++ class that allowed me to pass a reference counted string from a C++ library to Node and then back again from Node to C++. This, I felt, was pretty useful as it (hopefully) eliminated a lot of copies. I would definitely like to get something similar working for Deno.

However, I can see the potential uses of getting that pointer out again, so that is a feature on its own too. Even in the C++ API you basically have two options: 1) as you said, keep a mapping of your resources or 2) use RTTI of ExternalStringResourceBase. I'm not sure what the implications of relying on RTTI here would be or if the code is even compiled with RTTI, but it would be a cheaper check than a map lookup, especially considering v8 basically already requires the class to have virtual functions.

I definitely want to avoid RTTI for a few reasons:

  1. It would be fairly terrible to have to interact with that from Rust :)
  2. V8 and rusty_v8 likewise, I believe, are compiled with -fno-rtti.

With that being said though, I think (or hope) that we can still get a really performant, Rusty API in place. I'm thinking something like this:

assert!(v8_string.is_external_two_byte());
let resource = v8_string.get_external_two_byte(); // This doesn't actually exist in V8 API so we'll have to make it.
assert!(resource.is_managed_resource()); // This checks that the vtable pointer points to our managed resource base vtable
let resource = resource.into_managed_resource(); // transmute
resource.some_trait_method(); // managed resource base contains a `Box<dyn StringResource>`(?) which is then used through a deref maybe

So, rusty_v8 would offer a base resource class that internally contains a dynamic trait pointer. We'd have to make sure that this dynamic trait also supports essentially downcasting through some unique identifier or something... Maybe.

Something like that...?