Open candleindark opened 1 year ago
well, it wouldn't be needed AFAIK if we just use uuid or md5 of the url , and that is where we seems to be going to
@yarikoptic Frankly, I have never seen such URLs before. I filed the issue so that I can handle them later. Will these URLs complicate a solution for #146?
oh, I forgot that those aren't really following W3C standard but rather something git allows for . https://git-scm.com/docs/git-clone
When Git doesn’t know how to handle a certain transport protocol, it attempts to use the remote-<transport> remote helper, if one exists. To explicitly request a remote helper, the following syntax may be used:
<transport>::<address>
so we indeed should make code first split away transport (could be probably simply identified via [^/]+::
regex or made more specific -- see how git does it) and then process address for harmonization.
The 2nd one -- I am no longer sure on where I picked it up but that one is a "more standard" URL, since
❯ python -c 'from urllib.parse import urlparse;print(urlparse("libarchive://deeply/nested/path::ftp:///archive.7z"))'
ParseResult(scheme='libarchive', netloc='deeply', path='/nested/path::ftp:///archive.7z', params='', query='', fragment='')
so we can just leave it at that.
Overall ::
also used heavily in fsspec for "chaining" URLS: https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining but I don't know if we should care about that, until we see such somehow being used. @mih do you have constructs for datalad clone
which would be such "chained" URLs for datalad-annex::
?
As mention by @yarikoptic in https://github.com/datalad/datalad-registry/issues/125#issuecomment-1491109779, we may encounter URLs such as
datalad-annex::file://{export}?type=directory&directory={{path}}&encryption=none&dladotgit=uncompressed
andlibarchive://deeply/nested/path::ftp:///archive.7z
. Provide a solution to sanitize them.