andrew-thox / pb-journalist

Responsible for scraping sites
Eclipse Public License 1.0
0 stars 0 forks source link

Dublin Core Tags #15

Open andrew-thox opened 8 years ago

andrew-thox commented 8 years ago

The Independent are using dc:creator tags and tags. They put the same content in each but we should keep an eye out for what other outlets are doing. New Statesman are solely using dublin core tags.

dc:pubDate does have a standard format but this is different between RSS 1.0 (ISO8601) and RSS 2.0 (RFC2822) but some people are using ISO8601 in RSS 2.0. Immensely unhelpful!

I think as a preference we should use the DC tags where they exist.

andrew-thox commented 8 years ago

I'm talking shit.

RSS 2.0 has pubDate which is ISO8601. RSS 1.0 has dc:date which is RFC2822.

should contain an email but this usage is not especially common. dc:creator should contain the name of the author