The type sniffer is looking for an "<?xml" string and assuming any html without
it is text/html, but
this is an optional line and the W3 actually recommends not to use it in xhtml
(
http://www.w3.org/TR/xhtml1/guidelines.html)
I've hacked up my local copy with this patch which seems to improve things.
--- skipfish-1.07b/analysis.c 2010-03-20 02:47:46.000000000 +0000
+++ b/analysis.c 2010-03-21 19:44:45.623787778 +0000
@@ -1944,7 +1944,13 @@
inl_strcasestr(sniffbuf, (u8*)"<h1") ||
inl_strcasestr(sniffbuf, (u8*)"<li") ||
inl_strcasestr(sniffbuf, (u8*)"href=")) {
- res->sniff_mime_id = MIME_ASC_HTML;
+ if (inl_strcasestr(sniffbuf, (u8*)"
xmlns=\"http://www.w3.org/1999/xhtml\"") &&
+ inl_strcasestr(sniffbuf, (u8*)" xml:lang=") &&
+ !inl_strcasestr(sniffbuf, (u8*)" lang="))
+ res->sniff_mime_id = MIME_XML_XHTML;
+ else
+ res->sniff_mime_id = MIME_ASC_HTML;
+
return;
}
Original issue reported on code.google.com by flussence on 21 Mar 2010 at 8:04
Original issue reported on code.google.com by
flussence
on 21 Mar 2010 at 8:04