The host name (here meaning everything that appears before the top-level domain such as .com) may consist of a series of one or more strings ranging from 1-63 characters (alphanumeric or a hyphen) separated by a period, with the upper limit coming from RFC 1035
A top-level domain of 2-24 characters, the upper limit corresponding to the longest existing gTLD as of this writing (character may be alphanumeric or a hyphen, the latter allowing support for non-Latin-script gTLDs via "punycode")
An optional port number (1-5 decimal characters), which must be preceded by a colon
If there is more to the URI, it must start with a slash /
Following the slash, 0 or more of the following may appear:
An unreserved character, defined as a letter, digit, -, ., _, or ~. Note that the escape sequence \w is equivalent to [a-zA-Z0-9_]
A percent-encoded character (% followed by a two-digit hexadecimal number)
sub-delims, an additional set of permitted non-alphanumeric characters
Currently, links are validated with the following regular expression https://github.com/agile-learning-institute/mentorHub-mongodb/blob/9dc5ea5ec8b499ac7c9dacc2754a7cc02a44597d/src/mongosh/schemas/resources-1.0.2.json#L43 After some discussion and uncertainty about the intended meaning of this expression, it was resolved that a new pattern should be created. On the basis of RFC 3986 (primarily), I propose this pattern:
The intention is as follows:
scheme
must behttps
.com
) may consist of a series of one or more strings ranging from 1-63 characters (alphanumeric or a hyphen) separated by a period, with the upper limit coming from RFC 1035/
-
,.
,_
, or~
. Note that the escape sequence\w
is equivalent to[a-zA-Z0-9_]
%
followed by a two-digit hexadecimal number)sub-delims
, an additional set of permitted non-alphanumeric characters