akrennmair / newsbeuter

Newsbeuter is an open-source RSS/Atom feed reader for text terminals.
http://www.newsbeuter.org/
MIT License
780 stars 100 forks source link

Correct dc:creator in rss_10_parser.cpp #570

Closed soundsc closed 7 years ago

soundsc commented 7 years ago

Change item field creator to dc:creator. This corresponds to the Dublin Core specification as well as real-world usage. See issue #143.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 36.741% when pulling f6cd116bef2539887d6c84bd3aeb847d3384c45e on soundsc:patch-1 into d1c116d570f13276c21b62c215c3c3f8da69d576 on akrennmair:master.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 36.741% when pulling f6cd116bef2539887d6c84bd3aeb847d3384c45e on soundsc:patch-1 into d1c116d570f13276c21b62c215c3c3f8da69d576 on akrennmair:master.

Minoru commented 7 years ago

Even though it's closed, I think it might be useful to explain why we have "creator" without "dc:" and it still works. (If anything, I might forget this stuff one day, and it'll be good to have it written down so that I can link to it.)

First of all, "dc" part is not fixed—it can be named however the document creator wants. This is done using xmlns attribute of the document root. This thingy is just an alias for the URI that specifies what definitions should be used. Dublin Core is a set of such definitions, and its URI is http://purl.org/dc/elements/1.1/. The following two documents have equivalent meaning:

<?xml version="1.0" encoding="utf-8"?> 

<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns="http://purl.org/rss/1.0/"
> 

  <channel rdf:about="http://meerkat.oreillynet.com/?_fl=rss1.0">
    <dc:publisher>A big publishing house</dc:publisher>
<?xml version="1.0" encoding="utf-8"?> 

<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns:our_custom_name_for_dublin_core="http://purl.org/dc/elements/1.1/"
  xmlns="http://purl.org/rss/1.0/"
> 

  <channel rdf:about="http://meerkat.oreillynet.com/?_fl=rss1.0">
    <our_custom_name_for_dublin_core:publisher>A big publishing house</our_custom_name_for_dublin_core:publisher>

They differ only in the alias used for Dublin Core.

Second, and of most use with regards to this PR: we use libxml2 for parsing feeds, and it resolves these namespaces into their respective URIs. So when Newsbeuter is processing the parsed result, it doesn't see dc:publisher or our_custom_name_for_dublin_core:publisher—it sees a node with name publisher and namespace http://purl.org/dc/elements/1.1/. The latter happens to be defined in Newsbeuter as a string constant DC_URI. These two things are exactly what's passed to our node_is method.

Hope this unveils the mystery of how this all works despite "dc:" never being mentioned in the code.