False positive MIME warnings on xhtml files served as application/xhtml+xml

The type sniffer is looking for an "<?xml" string and assuming any html without 
it is text/html, but 
this is an optional line and the W3 actually recommends not to use it in xhtml 
( 
http://www.w3.org/TR/xhtml1/guidelines.html)

I've hacked up my local copy with this patch which seems to improve things.

--- skipfish-1.07b/analysis.c   2010-03-20 02:47:46.000000000 +0000
+++ b/analysis.c    2010-03-21 19:44:45.623787778 +0000
@@ -1944,7 +1944,13 @@
         inl_strcasestr(sniffbuf, (u8*)"<h1") ||
         inl_strcasestr(sniffbuf, (u8*)"<li") ||
         inl_strcasestr(sniffbuf, (u8*)"href=")) {
-      res->sniff_mime_id = MIME_ASC_HTML;
+      if (inl_strcasestr(sniffbuf, (u8*)" 
xmlns=\"http://www.w3.org/1999/xhtml\"") &&
+          inl_strcasestr(sniffbuf, (u8*)" xml:lang=") &&
+          !inl_strcasestr(sniffbuf, (u8*)" lang="))
+        res->sniff_mime_id = MIME_XML_XHTML;
+      else
+        res->sniff_mime_id = MIME_ASC_HTML;
+
       return;
     }

Original issue reported on code.google.com by flussence on 21 Mar 2010 at 8:04

dlee0113 / skipfish

False positive MIME warnings on xhtml files served as application/xhtml+xml #16