cboettig / labnotebook

:notebook: Source code and version history for my online lab notebook
http://www.carlboettiger.info
Creative Commons Zero v1.0 Universal
98 stars 61 forks source link

adjust html templates to polyglot standards for XML validation #21

Closed cboettig closed 11 years ago

cboettig commented 11 years ago

HTML5 allows tags to be unclosed, e.g. <meta charset="utf-8"> instead of <meta charset="utf-8" /> but this is not valid XML.

See:

Note that tags with no content must be closed with /> (like the meta example) while tags with content must get a normal closing (<li> item </li> etc) .

HTML5 spec is kinda logical since closing is syntax redundant even though not closing is not valid XML, see SO question for discussion.

cboettig commented 11 years ago

Basic XML parsing is now possible, e.g. in R:

require(XML)
require(RCurl)
tt <- getURLContent("http://www.carlboettiger.info/2012/10/25/stochastic-dynamic-programming-with-gaussian-process-approx.html")
doc <- xmlParse(tt)
getNodeSet(doc, "//x:meta", namespace="x")

Somehow <article> tag is being omitted from my html generation even though it is part of the template. Validation requires meta tags only appear in <head> and not <body> (but still do use property and not name for RDFa properties). Validation also requries all images have alt-text, which is not provided by the markdown parser. Otherwise validates as HTML5.

Ignoring the declared content-type will nearly validate as XML. (Had to make onClick from the Google tracker javascript must be lowercase onclick.)

HTML5 needs lang= declared in <html> tag, (while is tolerant of xml:lang), but XML doesn't want to see lang in the <html> tag. Perhaps can go somewhere else and be valid HTML5? (Using a meta tag for this is now obsolete).