agile-learning-institute / mentorHub-mongodb

Mongo Database for Institute system
2 stars 3 forks source link

Fix link validation #99

Closed michquinn closed 7 months ago

michquinn commented 8 months ago

Currently, links are validated with the following regular expression https://github.com/agile-learning-institute/mentorHub-mongodb/blob/9dc5ea5ec8b499ac7c9dacc2754a7cc02a44597d/src/mongosh/schemas/resources-1.0.2.json#L43 After some discussion and uncertainty about the intended meaning of this expression, it was resolved that a new pattern should be created. On the basis of RFC 3986 (primarily), I propose this pattern:

^https:\/\/([a-zA-Z0-9-]{1,63}\.)+[a-zA-Z-]{2,24}(:[0-9]{1,5})?(\/(([\w\-\.~]|%[a-fA-F0-9]{2}|[!\$&'\(\)\*\+,;=:@])*\/?)*)?

The intention is as follows:

  1. scheme must be https
  2. The host name (here meaning everything that appears before the top-level domain such as .com) may consist of a series of one or more strings ranging from 1-63 characters (alphanumeric or a hyphen) separated by a period, with the upper limit coming from RFC 1035
  3. A top-level domain of 2-24 characters, the upper limit corresponding to the longest existing gTLD as of this writing (character may be alphanumeric or a hyphen, the latter allowing support for non-Latin-script gTLDs via "punycode")
  4. An optional port number (1-5 decimal characters), which must be preceded by a colon
  5. If there is more to the URI, it must start with a slash /
  6. Following the slash, 0 or more of the following may appear:
    • An unreserved character, defined as a letter, digit, -, ., _, or ~. Note that the escape sequence \w is equivalent to [a-zA-Z0-9_]
    • A percent-encoded character (% followed by a two-digit hexadecimal number)
    • sub-delims, an additional set of permitted non-alphanumeric characters