Open theseanything opened 1 year ago
This doesn't only effect xlsx, but also docx, pptx etc.. type documents
To add to this it would be nice to able to have more granularity over what XML is parsed. For example, we use a OnXML handler to follow links in a XML sitemap, but our site contains many SVGs (image/svg+xml
) and RFDs (application/rdf+xml
) which also are unnecessarily parsed.
The handleOnXML function attempts to parse responses with the content-type
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
. This is because the function looks for any mention of xml in the content type. This results in a parse error whenxmlquery.Parse()
is called (For example: `encoding/xml.SyntaxError {Msg: "illegal character code U+0003", Line: 1}).XLSX files packaged as a zip - so can't be directly parsed as XML.
It would be ideal to not try and parse these files, possibly by being more explicit in which content-types we consider to be XML.