dop251 / goja_nodejs

Nodejs compatibility library for Goja
MIT License
336 stars 81 forks source link

Initial implementation of the NodeJS URL class #40

Closed etiennemartin closed 1 year ago

etiennemartin commented 1 year ago

We've made extensive use of the goja runtime without our projects. One of the missing pieces was the ability to use NodeJS's URL functionality. We've decided to create this PR and give back the implementation we wrote for others to use. We hope this helps others seeking for URL support with goja.

While the implementation is very close to the NodeJS documentation, we did leave out a few corner cases where the behaviour might differ.

Limitations:

myURL = new URL('foo://Example.com/', 'https://example.org/');
// foo://Example.com/

myURL = new URL('https:Example.com/', 'https://example.org/');
// https://example.org/Example.com/

The unit tests written in Javascript are a port of all the examples given in the NodeJS documentation: https://nodejs.org/api/url.html

etiennemartin commented 1 year ago

I've added support for searchParams property on the the URL object as specified in https://nodejs.org/api/url.html. The only aspect missing is the global constructors of the Class URLSearchParams. The implementation is tied to the URL class for the time being.

I think the current implementation has enough for a starting point in the current state. We can always evolve this implementation if the need arises.

dop251 commented 1 year ago

Thanks for submitting! I haven't had time to look into the details, but the overall implementation framework needs some adjustments. In particular, creating the prototype for each instance is not the right way, both from performance and correctness point of view (for example, instanceof won't work property).

Overall it should look something like this:

package url

import (
    "net/url"
    "reflect"

    "github.com/dop251/goja"
    "github.com/dop251/goja_nodejs/require"
)

const ModuleName = "node:URL"

var reflectTypeURL = reflect.TypeOf((*url.URL)(nil))

func toURL(r *goja.Runtime, v goja.Value) *url.URL {
    if v.ExportType() == reflectTypeURL {
        if u := v.Export().(*url.URL); u != nil {
            return u
        }
    }
    panic(r.NewTypeError("Expected URL"))
}

func defineURLAccessorProp(r *goja.Runtime, p *goja.Object, name string, getter func(*url.URL) interface{}, setter func(*url.URL, goja.Value)) {
    var getterVal, setterVal goja.Value
    if getter != nil {
        getterVal = r.ToValue(func(call goja.FunctionCall) goja.Value {
            return r.ToValue(getter(toURL(r, call.This)))
        })
    }
    if setter != nil {
        setterVal = r.ToValue(func(call goja.FunctionCall) goja.Value {
            setter(toURL(r, call.This), call.Argument(0))
            return goja.Undefined()
        })
    }
    p.DefineAccessorProperty(name, getterVal, setterVal, goja.FLAG_FALSE, goja.FLAG_TRUE)
}

func createURL(r *goja.Runtime) goja.Value {
    proto := r.NewObject()

    defineURLAccessorProp(r, proto, "host", func(u *url.URL) interface{} {
        return u.Host
    }, func(u *url.URL, arg goja.Value) {
        u.Host = arg.String() // TODO: validate and ignore if not a valid host
    })
    // TODO add the remaining properties

    f := r.ToValue(func(call goja.ConstructorCall) *goja.Object {
        var u *url.URL

        u, _ = url.Parse(call.Argument(0).String())
        // TODO: implement proper constructor logic

        res := r.ToValue(u).(*goja.Object)
        res.SetPrototype(call.This.Prototype())
        return res
    }).(*goja.Object)

    f.Set("prototype", proto)
    return f
}

func Require(runtime *goja.Runtime, module *goja.Object) {
    module.Set("exports", createURL(runtime))
}

func Enable(runtime *goja.Runtime) {
    runtime.Set("URL", require.Require(runtime, ModuleName))
}

func init() {
    require.RegisterNativeModule(ModuleName, Require)
}
etiennemartin commented 1 year ago

Thanks for the feedback! I'll post an update to the PR with the changes you're suggesting. Can you elaborate a bit on the performance aspect of the change? I'm asking because I would love to avoid performance issues in the rest of our code base. Is there something specific I should be looking for.

Again thanks for the feedback, it's very much appreciated. 🙌🏻

dop251 commented 1 year ago

Nothing specific, it's just that allocating and populating a prototype object for each instance obviously takes its toll on performance.

etiennemartin commented 1 year ago

@dop251 I've updated the code to reflect the changes you put in your example.

I've removed the searchParams part of the implementation for now. I'll revisit it later since I'm not 100% sure how we can share data between two objects. Previously I had a shared wrapper around the url itself which is no longer the case. So editing the searchParams should update the URL and vice versa. I need to figure out how that's going to work before re-introducing it.

Did you see any issues with keep an external reference to the url.URL like I had previously? Or is there a way to update the parent relationship between the searchParam and the instance of URL.

dop251 commented 1 year ago

I would probably have a wrapper struct around url.URL, as such:

type urlWrapper struct {
    u *url.URL
    query url.Values
}

And have another type for the query:

type urlQuery urlWrapper

Then you could define the getter like this:

    defineURLAccessorProp(r, proto, "searchParams", func(u *urlWrapper) interface{} {
        res := r.ToValue((*urlQuery)(u)).(*goja.Object)
        res.SetPrototype(searchParamsProto)
        return res
    }, nil)

Then you could have defineURLParamsAccessorProp and define all getters and setters on searchParamsProto a similar way.

The urlWrapper.query should be set lazily the first time it's accessed. This prevents calling url.Query() each time because it does the parsing. If URLSearchParams is used as a standalone instance, urlWrapper.u will be nil.

BTW, I didn't look closely, the 'node:url' module does not export the URL class as its root, rather:

> require("node:url")
{
  Url: [Function: Url],
  parse: [Function: urlParse],
  resolve: [Function: urlResolve],
  resolveObject: [Function: urlResolveObject],
  format: [Function: urlFormat],
  URL: [class URL],
  URLSearchParams: [class URLSearchParams],
  domainToASCII: [Function: domainToASCII],
  domainToUnicode: [Function: domainToUnicode],
  pathToFileURL: [Function: pathToFileURL],
  fileURLToPath: [Function: fileURLToPath],
  urlToHttpOptions: [Function: urlToHttpOptions]
}
etiennemartin commented 1 year ago

BTW, I didn't look closely, the 'node:url' module does not export the URL class as its root, rather:

I was thinking about doing another PR for the searchParams. If we address the issue with the root would you be ok merging this PR in? I'm asking because we would love to reference the URL implementation from this repo instead of custom one we have right now.

BTW, Most of this work is helping power a load testing tool (DDoS) we use internally here at Shopify. We appreciate the work you've put into goja. This tool helps us scale test our systems.

etiennemartin commented 1 year ago

I've renamed the method that creates the constructor and the prototype so the Enable call works as expected. I'm confused how we can model the exports of the module to represent the structure you posted:

> require("node:url")
{
  Url: [Function: Url],
  parse: [Function: urlParse],
  resolve: [Function: urlResolve],
  resolveObject: [Function: urlResolveObject],
  format: [Function: urlFormat],
  URL: [class URL],
  URLSearchParams: [class URLSearchParams],
  domainToASCII: [Function: domainToASCII],
  domainToUnicode: [Function: domainToUnicode],
  pathToFileURL: [Function: pathToFileURL],
  fileURLToPath: [Function: fileURLToPath],
  urlToHttpOptions: [Function: urlToHttpOptions]
}

Since classes are a concept of ECMA 6, should the exports of the module simply be an empty object? Like follows:

func Require(runtime *goja.Runtime, module *goja.Object) {
    module.Set("exports", runtime.NewObject()) // Empty object
}

This would allow the URL functionality to be enabled via the Enable call, and allow us to expand on the URL exports in the future. (i.e. domainToASCII, domainToUnicode, etc...)

etiennemartin commented 1 year ago

Added the recommended changes. Let me know what you think.

etiennemartin commented 1 year ago

Nice catch on the port. I've updated the code to reflect your changes and updated the unit tests as well.

dop251 commented 1 year ago

I'm making some changes to this PR, and I have a question: did you try running test/url_test.js in nodejs? If so, what version?

dop251 commented 1 year ago

Ok, I've made some changes and merged it in https://github.com/dop251/goja_nodejs/pull/43. It's quite tricky to get full compatibility because escaping rules are slightly different. It will be even more tricky for the query parameters (I believe to get full compatibility it would require a custom parser).

etiennemartin commented 1 year ago

I took a look at the final changes you made. Looks good. Some really good clean up and code structure in there. Originally I wasn't sure how close we wanted the implementation to be. I can see now we want to be exact. I'll try and take a stab at the search parameters if that's ok.

Thanks again for the advice and merging this in! 🙌🏻