Closed GoogleCodeExporter closed 9 years ago
It looks like it's seeing it as an entity. This is true for XML, but the server
wants the page identified as HTML, and parsed as HTML this would work. So we
are sniffing the document wrong.
Original comment by classi...@floodgap.com
on 21 Jan 2012 at 1:47
Trying...
Connected to newsmax.com.
Escape character is '^]'.
GET /m HTTP/1.0
Host: www.newsmax.com
Connection: close
HTTP/1.1 200 OK
Cache-Control: no-cache,private, no-store, must-revalidate
Content-Length: 8077
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNet-Version: 2.0.50727
Set-Cookie: CMSPreferredCulture=en-US; expires=Mon, 21-Jan-2013 01:45:20 GMT;
path=/
Set-Cookie: ASP.NET_SessionId=d1u3e245ilhwhc550zbeoham; path=/; HttpOnly
X-Powered-By: ASP.NET
X-UA-Compatible: IE=7
Date: Sat, 21 Jan 2012 01:45:19 GMT
Connection: close
Original comment by classi...@floodgap.com
on 21 Jan 2012 at 1:47
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN"
"http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
Original comment by classi...@floodgap.com
on 21 Jan 2012 at 1:48
Current suspect: htmlparser/src/nsParser.cpp:DetermineParseMode
We'll throw a breakpoint in there when we're ready to debug this.
Original comment by classi...@floodgap.com
on 21 Jan 2012 at 2:51
Actually, the MIME type detect is not failing, because newsmax declares itself
as XML:
<!-- Mobile Meta Tags -->
<meta http-equiv="Content-type" content="application/xhtml+xml; charset=utf-8" />
The only way around this is to relax the parser. Yuck.
Original comment by classi...@floodgap.com
on 1 Feb 2012 at 2:37
Altering expat so that XML_TOK_INVALID parses leads to "success" but holes in
the page.
Maybe the simplest way is just to force application/xhtml+xml to be parsed as
HTML. This is wrong, but no more wrong than other hacks we do.
Original comment by classi...@floodgap.com
on 1 Feb 2012 at 3:26
This is what we did, and now the site works.
Let's see if this breaks anything.
Original comment by classi...@floodgap.com
on 19 Feb 2012 at 5:07
It breaks about: (since about: needs to be parsed as xhtml). Maybe we add an
exception for this.
Original comment by classi...@floodgap.com
on 4 Mar 2012 at 2:47
Implemented better solution from issue 189: fudge content types in
HttpChannel::ProcessNormal(). Since about: is loaded from jar:, it will not get
its content type changed, and is parsed as proper XHTML. Since this loads from
the network, it will.
Original comment by classi...@floodgap.com
on 5 Mar 2012 at 12:37
Original comment by classi...@floodgap.com
on 19 Oct 2012 at 4:49
Original issue reported on code.google.com by
classi...@floodgap.com
on 21 Jan 2012 at 1:37Attachments: