Open Chris00 opened 10 years ago
xml:base
is also useful to resolve relative links in the post content. See also https://github.com/ocaml/platform-blog/issues/12#issuecomment-52685888
If you implement xml:base
, be sure to read http://www.w3.org/TR/xmlbase/ and implement relative bases. Since 1.3.12 or so, Uri
supports relative-relative resolution so you can partially resolve against xml:base
and emit XML without xml:base
included for processors that don't understand it and want to just resolve against retrieval URL or self link.
For extra bonus points, solve the xml:base
problem for everyone by releasing xmlmbase
or something so I don't have to keep re-implementing it...
@dsheets It would be nice if Uri.resolve
was slightly more documented — I haven't read the code yet but in view of some toplevel experiments I am not sure what is the point of the scheme.
Also, I don't know how you see it but I think it is better for the Atom,... parsed documents to contain the full (i.e., if possible resolved) URIs because the feeds can be merged,...
What do you mean by "scheme"? The scheme component of a URI? The point is to specify the protocol or resolution method of the rest of the identifier...
As for resolution to absolute identifiers, it depends on what you are manipulating. Given only an XML stream using the Atom vocabulary, the best one can do is resolve with the contained URI bases and URIs. If they are absolute, you will get absolute URIs. If they are relative, you will get relative (but more precise) URIs. Sometimes you don't want to remove xml:base
. Often you want to remove it so that other processors don't have to deal with it. If relative URIs are used, you can't resolve them until you have a base URI from the transport protocol or resource retrieval. If you have that information, I absolutely agree that you should use it to resolve relative URIs (if your processing is based on traversing links... if you are transforming the document but will re-serve it later from potentially a different address, you should keep things relative...).
I hope this makes sense. I think for most use cases of this library, the base or absolute retrieval URI will be known and should be used. If you are just writing a function to remove xml:base
, you shouldn't use the retrieval identifier, though. I think in Atom's case, relative URLs may be against the self
link as well (but I haven't checked the spec). You may not want to resolve those.
And, yes, Uri.resolve
should have more extensive documentation.
On Sun, 7 Dec 2014 11:03:13 -0800, David Sheets wrote:
What do you mean by "scheme"? The scheme component of a URI? The point is to specify the protocol or resolution method of the rest of the identifier...
(* Resolve a URI against a default scheme and base URI ) val resolve : string -> t -> t -> t
“scheme” refers to the above sentence.
I'll read the rest later.
Ah, you need to provide a scheme here to direct the resolution regarding scheme-specific behavior so ""
or "http"
are typical. I'm not too happy about this part of the interface but there are some scheme-dependent resolution rules for host normalization. Specifically, "http" (or "https") will lowercase the hostname per DNS, "file" will also remove "localhost", and "" will perform no host normalization. Unfortunately, there are other scheme-dependencies but they haven't been captured in the library yet.
I've been planning a major update to the interface for several months. Please do post issues, ideas, suggestions, questions, etc to the issue tracker.
Do we want to capture some XML attributes for Atom feeds, for example
xml:lang
forcontent
? Example