alwinb / reurl

URL manipulation library that supports relative URLs in a way that is compatible with the WHATWG URL Standard.
MIT License
8 stars 4 forks source link
relative-urls url-parser urls whatwg-url

NPM badge

ReURL

ReUrl is a library for parsing and manipulating URLs. It supports relative- and non-normalized URLs and a number of operations on them. It can be used to parse, resolve, normalize and serialize URLs in separate phases and in such a way that it conforms to the WhatWG URL Standard.

Motivation

I wrote this library because I needed a library that supported non-normalized and relative URLs but I also wanted to be certain that it followed the specification completely.

The WhatWG URL Standard defines URLs in terms of a parser algorithm that resolves URLs, normalizes URLs and serializes URL components in one pass. Thus to implement a library that follows the standard, but also supports a versatile set of operations on relative, and non-normalized URLs, I had to disentangle these phases from the specification and to some extent rephrase the specification in more elementary terms.

Eventually I came up with a small 'theory' of URLs that I found very helpful and I based the library on that. Over time, this theory has become thoroughly documented in this new URL Specification.

Installation

Node.js

npm install reurl

Standalone minified build

git clone https://github.com/alwinb/reurl.git
cd reurl
make all
cp dist/reurl.min.js /my/project/js/

API

Overview

The ReUrl library exposes an Url class and a RawUrl class with an identical API. Their only difference is in their handling of percent escape sequences.

In a Node.JS project, you can use these classes as follows:

import { Url, RawUrl } from 'reurl'

Note: ReUrl is an ESM-only module, so it cannot be imported with require.

Url For Url objects the URL parser **decodes** percent escape sequences, getters report percent-decoded values and the _set_ method assumes that its input is percent-decoded unless explicitly specified otherwise. ```javascript var url = new Url ('//host/%61bc') url.file // => 'abc' url = url.set ({ query:'%def' }) url.query // => '%def' url.toString () // => '//host/abc?%25def' ```
RawUrl For RawUrl objects the parser **preserves** percent escape sequences, getters report values with percent-escape-sequenes preserved and _set_ expects values in which % signs start a percent-escape sequence. ```javascript var url = new RawUrl ('//host/%61bc') url.file // => '%61bc' url = url.set ({ query:'%25%64ef' }) url.query // => '%25%64ef' url.toString () // => '//host/%61bc?%25%64ef' ```

Url and RawUrl objects are immutable. Modifying URLs is acomplished through methods that return new Url and/ or RawUrl objects, such as the url.set (patch) method described below.

Constructors

new Url (string \[, conf]) Construct a new Url object from an URL-string. The optional _conf_ argument, if present must be a configuration object as described below. ```javascript var url = new Url ('sc:/foo/bar') console.log (url) // => Url { scheme: 'sc', root: '/', dirs: [ 'foo' ], file: 'bar' } ```
new Url (object) Construct a new Url object from any object, possibly an Url object itself. The optional conf argument, if present, must be a configuration object as described below. Throws an error if the object cannot be coerced into a valid URL. ```javascript var url = new Url ({ scheme:'file', dirs:['foo', 'buzz'], file:'abc' }) console.log (url.toString ()) // => 'file:foo/buzz/abc' ```
conf.parser You can pass a configuration object with a **parser** property to the Url constructor to trigger scheme-specific parsing behaviour for relative, scheme-less URL-strings. The scheme determines support for windows drive-letters and backslash separators. Drive-letters are only supported in `file` URL-strings, and backslash separators are limited to `file`, `http`, `https`, `ws`, `wss` and `ftp` URL-strings. ```javascript var url = new Url ('/c:/foo\\bar', { parser:'file' }) console.log (url) // => Url { drive: 'c:', root: '/', dirs: [ 'foo' ], file: 'bar' } ``` ```javascript var url = new Url ('/c:/foo\\bar', { parser:'http' }) console.log (url) // => Url { root: '/', dirs: [ 'c:', 'foo' ], file: 'bar' } ``` ```javascript var url = new Url ('/c:/foo\\bar') console.log (url) // => Url { root: '/', dirs: [ 'c:', 'foo' ], file: 'bar' } ```

Properties

Url and RawUrl objects have the following optional properties.

url.scheme The scheme of an URL as a string. This property is absent if no scheme part is present, e.g. in scheme-relative URLs. ```javascript new Url ('http://foo?search#baz') .scheme // => 'http' ``` ```javascript new Url ('/abc/?') .scheme // => undefined ```
url.user The username of an URL as a string. This property is absent if the URL does not have an authority or does not have credentials. ```javascript new Url ('http://joe@localhost') .user // => 'joe' ``` ```javascript new Url ('//host/abc') .user // => undefined ```
url.pass A property for the password of an URL as a string. This property is absent if the URL does not have an authority, credentials or password. ```javascript new Url ('http://joe@localhost') .pass // => undefined ``` ```javascript new Url ('http://host') .pass // => undefined ``` ```javascript new Url ('http://joe:pass@localhost') .pass // => 'pass' ``` ```javascript new Url ('http://joe:@localhost') .pass // => '' ```
url.host A property for the hostname of an URL as a string, This property is absent if the URL does not have an authority. ```javascript new Url ('http://localhost') .host // => 'localhost' ``` ```javascript new Url ('http:foo') .host // => undefined ``` ```javascript new Url ('/foo') .host // => undefined ```
url.port The port of (the authority part of) of an URL, being either a number, or the empty string if present. The property is absent if the URL does not have an authority or a port. ```javascript new Url ('http://localhost:8080') .port // => 8080 ``` ```javascript new Url ('foo://host:/foo') .port // => '' ``` ```javascript new Url ('foo://host/foo') .port // => undefined ```
url.root A property for the path-root of an URL. Its value is `'/'` if the URL has an absolute path. The property is absent otherwise. ```javascript new Url ('foo://localhost?q') .root // => undefined ``` ```javascript new Url ('foo://localhost/') .root // => '/' ``` ```javascript new Url ('foo/bar') // => Url { dirs: [ 'foo' ], file: 'bar' } ``` ```javascript new Url ('/foo/bar') // => Url { root: '/', dirs: [ 'foo' ], file: 'bar' } ``` It is possible for file URLs to have a drive, but not a root. ```javascript new Url ('file:/c:') // => Url { scheme: 'file', drive: 'c:' } ``` ```javascript new Url ('file:/c:/') // => Url { scheme: 'file', drive: 'c:', root: '/' } ```
url.drive A property for the drive of an URL as a string, if present. Note that the presence of drives depends on the parser settings and/ or URL scheme. ```javascript new Url ('file://c:') .drive // => 'c:' ``` ```javascript new Url ('http://c:') .drive // => undefined ``` ```javascript new Url ('/c:/foo/bar', 'file') .drive // => 'c:' ``` ```javascript new Url ('/c:/foo/bar') .drive // => undefined ```
url.dirs If present, a _nonempty_ array of strings. Note that the trailing slash determines whether a component is part of the **dirs** or set as the **file** property. ```javascript new Url ('/foo/bar/baz/').dirs // => [ 'foo', 'bar', 'baz' ] ``` ```javascript new Url ('/foo/bar/baz').dirs // => [ 'foo', 'bar' ] ```
url.file If present, a non-empty string. ```javascript new Url ('/foo/bar/baz') .file // => 'baz' ``` ```javascript new Url ('/foo/bar/baz/') .file // => undefined ```
url.query A property for the query part of `url` as a string, if present. ```javascript new Url ('http://foo?search#baz') .query // => 'search' ``` ```javascript new Url ('/abc/?') .query // => '' ``` ```javascript new Url ('/abc/') .query // => undefined ```
url.hash A property for the hash part of `url` as a string, if present. ```javascript new Url ('http://foo#baz') .hash // => 'baz' ``` ```javascript new Url ('/abc/#') .hash // => '' ``` ```javascript new Url ('/abc/') .hash // => undefined ```

Setting Properties

Url and RawUrl objects are immutable, therefore setting and removing components is achieved via a set method that takes a patch object.

url.set (patch) The _patch_ object may contain one or more keys being **scheme**, **user**, **pass**, **host**, **port**, **drive**, **root**, **dirs**, **file**, **query** and/ or **hash**. To remove a component you can set its patch' value to null. If present; – **port** must be `null`, a string, or a number – **dirs** must be an array of strings – **root** may be anything and is converted to `'/'` if truth-y and is interpreted as `null` otherwise – all others must be `null` or a string. ```javascript new Url ('//host/dir/file') .set ({ host:null, query:'q', hash:'h' }) .toString () // => '/dir/file?q#h' ``` ##### Resets For security reasons, setting the **user** will remove **pass**, unless a value is supplied for it as well. Setting the **host** will remove **user**, **pass** and **port**, unless values are supplied for them as well. ```javascript new Url ('http://joe:secret@example.com') .set ({ user:'jane' }) .toString () // => 'http://jane@example.com' ``` ```javascript new Url ('http://joe:secret@localhost:8080') .set ({ host:'example.com' }) .toString () // => 'http://example.com' ```
patch.percentCoded The _patch_ may have an additional key **percentCoded** with a boolean value to indicate that strings in the patch contain percent encode sequences. This means that you can pass percent-_encoded_ values to Url.set by explicity setting **percentCoded** to true. The values will then be decoded. ```javascript var url = new Url ('//host/') url = url.set ({ file:'%61bc-%25-sign', percentCoded:true }) url.file // => 'abc-%-sign' log (url.toString ()) // => '//host/abc-%25-sign' ``` You can pass percent-_decoded_ values to RawUrl.set by explicitly setting **percentCoded** to false. Percent characters in values will then be encoded; specifically, they will be replaced with `%25`. ```javascript var rawUrl = new RawUrl ('//host/') rawUrl = rawUrl.set ({ file:'abc-%-sign', percentCoded:false }) rawUrl.file // => 'abc-%25-sign' rawUrl.toString () // => '//host/abc-%25-sign' ``` **Note** that if no percentCoded value is specified, then Url.set assumes percentCoded to be _false_ whilst RawUrl.set assumes percentCoded to be _true_. ```javascript var url = new Url ('//host/') .set ({ file:'%61bc' }) url.file // => '%61bc' url.toString () // => '//host/%2561bc' ``` ```javascript var rawUrl = new RawUrl ('//host/') .set ({ file:'%61bc' }) url.file // => '%61bc' rawUrl.toString () // => '//host/%61bc' ```

Conversions

url.toString () Converts an Url object to a string. Percent encodes only a minimal set of codepoints. The resulting string may contain non-ASCII codepoints. ```javascript var url = new Url ('http://🌿🌿🌿/{braces}/hʌɪ') url.toString () // => 'http://🌿🌿🌿/%7Bbraces%7D/hʌɪ' ```
url.toASCII (), url.toJSON (), url.href Converts an Url object to a string that contains only ASCII code points. Non-ASCII codepoints in components will be percent encoded and/ or punycoded. ```javascript var url = new Url ('http://🌿🌿🌿/{braces}/hʌɪ') url.toASCII () // => 'http://xn--8h8haa/%7Bbraces%7D/h%CA%8C%C9%AA' ```
url.toURI () Uses url.toASCII () to convert url to an [RFC3986] URI. Throws an error if url does not have a scheme, because URIs must always have a scheme.

Normalisation

url.normalize (), url.normalise () Returns a new Url object by normalizing `url`. This interprets a.o. `.` and `..` segments within the path and removes default ports and trivial usernames/ passwords from the authority of `url`. ```javascript new Url ('http://foo/bar/baz/./../bee') .normalize () .toString () // => 'http://foo/bar/bee' ```

Percent Coding

url.percentEncode () Returns a RawUrl object by percent-encoding the properties of `url` according to the Standard. Prevents double escaping of percent-encoded-bytes in the case of RawUrl objects.
url.percentDecode () Returns an Url object by percent-decoding the properties of `url` if it is a RawUrl, and leaving them as-is otherwise.

Goto

url.goto (url2) Returns a new Url object by 'extending' _url_ with _url2_, where _url2_ may be a string, an Url or a RawUrl object. ```javascript new Url ('/foo/bar') .goto ('baz/index.html') .toString () // => '/foo/baz/index.html' ``` ```javascript new Url ('/foo/bar') .goto ('//host/path') .toString () // => '//host/path' ``` ```javascript new Url ('http://foo/bar/baz/') .goto ('./../bee') .toString () // => 'http://foo/bar/baz/./../bee' ``` If _url2_ is a string, it will be parsed with the scheme of _url_ as a fallback scheme. TODO: if _url_ has no scheme then … ```javascript new Url ('file://host/dir/') .goto ('c|/dir2/') .toString () // => 'file://host/c|/dir2/' ``` ```javascript new Url ('http://host/dir/') .goto ('c|/dir2/') .toString () // => 'http://host/dir/c|/dir2/' ```

Base URLs

url.isBase () Returns a boolean, indicating if _url_ is a _base-URL_. What is and is not a base-URL, depends on the _scheme_ of an URL. For example, `http`- and `file`-URLs that do not have a _host_ are not base-URLs.
url.force () Forcibly convert an Url to a base-URL according to this [URL Specification], in accordance with the [WHATWG Standard]. - In `file` URLs without hostname, the hostname will be set to `''`. - For URLs that have a scheme being one of `http`, `https`, `ws`, `wss` or `ftp` and an absent or empty authority, the authority will be 'stolen from the first nonempty path segment'. - In the latter case, an error is thrown if _url_ cannot be forced. This happens if it has no scheme, or if it has an empty host and no non-empty path segment. ```javascript new Url ('http:foo/bar') .force () .toString () // => 'http://foo/bar' ``` ```javascript new Url ('http:/foo/bar') .force () .toString () // => 'http://foo/bar' ``` ```javascript new Url ('http://foo/bar') .force () .toString () // => 'http://foo/bar' ``` ```javascript new Url ('http:///foo/bar') .force () .toString () // => 'http://foo/bar' ```

Reference Resolution

url.genericResolve (base) — RFC3986 - strict Resolve an Url object _url_ against a base URL _base_ according to the **strict** [reference resolution][RFC-resolution] algorithm as defined in RFC3986.
url.legacyResolve (base) — RFC 3986 - non-strict Resolve an Url object _url_ against a base URL _base_ according to the **non-strict** [reference resolution][RFC-resolution] algorithm as defined in RFC3986.
url.WHATWGResolve (base), aka. url.resolve Resolve an Url object _url_ against a base URL _base_ in a way that is compatible with the error-correcting, forcing reference resoluton algorithm as defined in the [WHATWG Standard].

Changelog

Version 1.0.0-rc.2

ReUrl now exposes three methods for reference resolution:

License

MIT.

Enjoy!