garretus / phpquery

Automatically exported from code.google.com/p/phpquery
0 stars 0 forks source link

cannot find contents after <head> block if document does not contain open <html> tag #93

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
stumbled on this one trying to scrape a government site, of all things.  This 
bad page 
mysteriously does not have an open <html> tag (but has a </html> at the end..!) 
- not sure if it 
should be an issue or not, but since this worked fine in the July 2008 version 
(.9.1) I figured I 
would submit report.

What steps will reproduce the problem?
1. create document with <head> and <body> blocks but do NOT wrap in <html> block
2. try to query inside body block or just print pq();

What is the expected output? 
all content, including any blocks after the <head> block.  in version .9.1, 
this worked.

What do you see instead?
nothing - parser does not find or recognize <body> or any other block after the 
<head>

test code:

$doc = '<head><title>SomeTitle</title>
</head>
<body bgcolor="#ffffff" text="#000000" topmargin="1" leftmargin="0">blah
</body>';
$pq = phpQuery::newDocument($doc);
echo $pq;

of course, I can fix this in the PHP for this version, by prepending '<html>' 
before parsing into 
phpQuery, but before it worked, and seems like it still should.

Original issue reported on code.google.com by joey...@gmail.com on 7 Jan 2009 at 6:10

GoogleCodeExporter commented 8 years ago

Original comment by tobiasz....@gmail.com on 7 Jan 2009 at 10:34