denoland / deno_emit

Transpile and bundle JavaScript and TypeScript under Deno and Deno Deploy
https://jsr.io/@deno/emit
MIT License
222 stars 23 forks source link

Trailing slash added to base URL in transpile load function specifier #123

Closed abflow closed 1 year ago

abflow commented 1 year ago

In the transpile function (possibly in the bundle function too but not tested), the load function's specifier parameter inconsistently includes a trailing slash when the URL does not contain a path (base URL).

Bug reproduction:

import { transpile } from "https://deno.land/x/emit/mod.ts";
import { assertEquals } from "std/testing/asserts.ts";

const testURL = "https://test.com";

const load = function (specifier) {
    assertEquals(specifier, testURL);
};

try {
    await transpile(testURL, { load });
} catch (error) {
    console.error(error);
}
Error: load rejected or errored: JsValue(AssertionError: Values are not equal.

    [Diff] Actual / Expected

-   https://test.com/
+   https://test.com

AssertionError: Values are not equal.

Expected behavior The specifier parameter should consistently exclude the trailing slash for base URLs and not mutate the actual specifier passed to the transpile function, or the behavior should be clearly documented so developers can adjust their code accordingly.

Actual behavior A trailing slash is added to the specifier when the URL does not include a path (base URL).

yacinehmito commented 1 year ago

According to the HTTP specification, a root URL with a trailing slash is a equivalent to a root URL without a trailing slash.

In RFC 2616:

3.2.2 http URL

The "http" scheme is used to locate network resources via the HTTP protocol. This section defines the scheme-specific syntax and semantics for http URLs.

http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

If the port is empty or not given, port 80 is assumed. The semantics are that the identified resource is located at the server listening for TCP connections on that port of that host, and the Request-URI for the resource is abs_path (section 5.1.2). The use of IP addresses in URLs SHOULD be avoided whenever possible (see RFC 1900 [24]). If the abs_path is not present in the URL, it MUST be given as "/" when used as a Request-URI for a resource (section 5.1.2). If a proxy receives a host name which is not a fully qualified domain name, it MAY add its domain to the host name it received. If a proxy receives a fully qualified domain name, the proxy MUST NOT change the host name.

The transformation is thus necessary in order to have a canonical form. Otherwise, root URLs with and without a trailing slash would be treated as different specifier, even though they point to the same resource.

The URL object part of the Web API enforces this. Try the following:

const url = new URL("https://test.com");
console.log(url.toString());
// ^ Will output "https://test.com/"

So it's not a bug.

(Also, the specifier is not mutated. Strings don't mutate in JS, so this behavior does not affect user-level code.)

abflow commented 1 year ago

The fact that two URLs are equivalent in the specification you mentioned does not mean that the user code should always enforce the transformation of one into another, by automatically adding a trailing slash to the base URL.

As a matter of fact, if one navigates to https://github.com, the browser will not enforce a trailing slash in the address bar. If one now navigates to https://github.com/, the server will enforce https://github.com and actually remove the trailing slash. You can try it right now in your browser:

// Navigation to https://github.com/:
[Diff] Actual (red) / Expected according to [RFC 2616] 3.2.2 http URL  (green)
-   https://github.com
+  https://github.com/

AssertionError: Values are not equal.

In our code, we purposefully did not use the new URL constructor and passed the value as a javascript string, in order to not have the constrained behavior from RFC 2616 that you presented.

Conclusion: It is up to users to define how are handled the trailing slashes in the servers they develop, and the trailing slash should not automatically be enforced at the level of the specifier by modifying the value of the URL string given as an argument to the transpile function and passed as an argument to the load function.

PS: the actual behavior is equivalent to a mutation as the specifier is used to access in-memory modules at runtime. As the specifier points to another value than testURL, it introduces a discrepancy in the user code, as it is expected that the values should be equal. It actually forces the user to do an equality check for the base URL in the load function in order to correct the behavior.

yacinehmito commented 1 year ago

The specifier has not changed. The one that is provided as argument to the load function is just what it resolves to in Deno's internals.

One cannot expect Deno to treat these specifiers differently. By spec, the two URLs points to the same resource. If Deno starts to differentiate between root URLs with or without trailing slashes, then it cannot efficiently cache them. It might be the behavior you want, but that will break things for other users.

The fact that the browser removes the trailing slash has no relationship with Deno's desired behavior. It's a UI thing. If you look at the network tab, you will see that the HTTP request that is sent by the browser is the same regardless of whether there is a trailing slash; and in all cases, the trailing slash is present in the request.

Can you please explain your use case so that we can understand how the current behavior can cause issues? Thank you.

abflow commented 1 year ago

Thank you. This caused an issue when mapping the specifier to a key in an object or a map with the base url without trailing slash. Now that the way the URL is defined internally is clarified, we’ll make sure that the base url always has a trailing slash in our implementation.