danmactough / node-feedparser

Robust RSS, Atom, and RDF feed parsing in Node.js
Other
1.97k stars 192 forks source link

Handle empty xml:base #294

Open chrisguttandin opened 11 months ago

chrisguttandin commented 11 months ago

I noticed that a feed with an empty xml:base="" breaks the parsing algorithm in a strange way. I came across such a feed in the wild. This PR attempts to fix that. Please let me know if anything needs to be changed.

I added a new test fixture which is a copy of test/feeds/tpm.atom. The only difference is that I added xml:base="".

The test that I added breaks with the new fixture. The additional checks fix the problem. I'm not sure though if it's the best way to do this as I'm not familiar with the code base. Please let me know if it should be done differently.

danmactough commented 11 months ago

@chrisguttandin Thanks so much for reporting this and submitting a fix. 🤗 Turns out the relevant spec has a special-case rule for when the xml:base is an empty string:

RFC 3986 defines certain relative URI references, in particular the empty string and those of the form #fragment, as same-document references. Dereferencing of same-document references is handled specially. However, their use as the value of an xml:base attribute does not involve dereferencing, and XML Base processors should resolve them in the usual way. In particular, xml:base="" does not reset the base URI to that of the containing document.

It would be awesome if you could update your PR to implement that logic, if you're up for it.

chrisguttandin commented 11 months ago

Hi @danmactough, thanks for taking a look. I would be happy to make the fix spec compliant but I'm not sure what the spec is trying to say.

As far as I understand it it says an empty xml:base should be ignored. At least it says it should not do dereferencing and it should not reset the base URI.

Sorry for the dumb question, but what would need to change in the implementation to make it spec compliant?

chrisguttandin commented 5 months ago

Hi @danmactough, could you please let me know what needs to be changed in order to make these changes spec compliant?