Open chrisguttandin opened 11 months ago
@chrisguttandin Thanks so much for reporting this and submitting a fix. 🤗 Turns out the relevant spec has a special-case rule for when the xml:base is an empty string:
RFC 3986 defines certain relative URI references, in particular the empty string and those of the form #fragment, as same-document references. Dereferencing of same-document references is handled specially. However, their use as the value of an xml:base attribute does not involve dereferencing, and XML Base processors should resolve them in the usual way. In particular, xml:base="" does not reset the base URI to that of the containing document.
It would be awesome if you could update your PR to implement that logic, if you're up for it.
Hi @danmactough, thanks for taking a look. I would be happy to make the fix spec compliant but I'm not sure what the spec is trying to say.
As far as I understand it it says an empty xml:base
should be ignored. At least it says it should not do dereferencing and it should not reset the base URI.
Sorry for the dumb question, but what would need to change in the implementation to make it spec compliant?
Hi @danmactough, could you please let me know what needs to be changed in order to make these changes spec compliant?
I noticed that a feed with an empty
xml:base=""
breaks the parsing algorithm in a strange way. I came across such a feed in the wild. This PR attempts to fix that. Please let me know if anything needs to be changed.I added a new test fixture which is a copy of
test/feeds/tpm.atom
. The only difference is that I addedxml:base=""
.The test that I added breaks with the new fixture. The additional checks fix the problem. I'm not sure though if it's the best way to do this as I'm not familiar with the code base. Please let me know if it should be done differently.