Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
The specification for URL or Path says the field must be a an HTTP or HTTPS URL or a POSIX relative path.
The regex is ^(?=^[^./~])(^((?!\.{2}).)*$).*$, which accepts paths with backslashes (e.g. \etc\passwd) and non-HTTP schemes (e.g. file:/etc/passwd, file://etc/passwd, file://localhost/etc/passwd, file:///etc/passwd, ftp://something).
^ — start of string
( — two alternatives, the POSIX path or the HTTP(S) URL
(?=[^./~]) — first character of POSIX path is not . / or ~
(?!file:) — must not start with file:
(
(?!/\.\./) — must not contain /../
(?!\\) — must not contain backslashes
(?!:\/\/) — must not contain URL-like schemes, ftp:// etc.
. — a character
)* — repeat to the end
|
https?:\/\/.* — or must start http:// or https://
)$ — end of string
This blocks some POSIX-valid but very weird filenames like weird://file.jpg (mkdir weird: && touch weird://file.jpg) and not\a\directory (touch not\\a\\directory && ls -l not\\a\\directory), but most URLs are also valid relative filenames so in that sense the specification isn't valid.
It allows valid filenames like somefile:name, c:/aoeu.bat, /etc/passwd (leading space), example..jpg and URLs like http://localhost/../thing.
The specification for URL or Path says the field must be a an HTTP or HTTPS URL or a POSIX relative path.
The regex is
^(?=^[^./~])(^((?!\.{2}).)*$).*$
, which accepts paths with backslashes (e.g.\etc\passwd
) and non-HTTP schemes (e.g.file:/etc/passwd
,file://etc/passwd
,file://localhost/etc/passwd
,file:///etc/passwd
,ftp://something
).I suggest this regex instead:
Breaking that down:
This blocks some POSIX-valid but very weird filenames like
weird://file.jpg
(mkdir weird: && touch weird://file.jpg
) andnot\a\directory
(touch not\\a\\directory && ls -l not\\a\\directory
), but most URLs are also valid relative filenames so in that sense the specification isn't valid.It allows valid filenames like
somefile:name
,c:/aoeu.bat
,/etc/passwd
(leading space),example..jpg
and URLs likehttp://localhost/../thing
.Regex testing: https://regex101.com/r/GDV9eW/1