Closed chfoo closed 10 years ago
Pager URLs should only be POST'ed with the pager params. Possible solution: reject urls with ms_lc_lo in URL when scraping the fragments.
WARC/1.0 WARC-Type: request WARC-Target-URI: http://tom-schreuder.hyves.nl/?xmlHttp=1&module=pager&action=showPage&name=ms_lc_lo Content-Type: application/http;msgtype=request WARC-Date: 2013-11-19T16:27:40Z WARC-Record-ID: <urn:uuid:6565ab4c-dbe4-4aa3-ab90-9d6bb3d5eb3b> WARC-IP-Address: 94.100.127.68 WARC-Warcinfo-ID: <urn:uuid:2927591f-7b56-4e62-8682-c4bca829e422> WARC-Block-Digest: sha1:KXAEHZ5I6IRSFA77LEKHD4XYY3TUQVRD Content-Length: 385 GET /?xmlHttp=1&module=pager&action=showPage&name=ms_lc_lo HTTP/1.1 Referer: http://tom-schreuder.hyves.nl/index.php?xmlHttp=1&module=pager&action=showPage&name=ms_lc_lo User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 Accept: */* Host: tom-schreuder.hyves.nl Connection: Keep-Alive Cookie: GP=deadbeef WARC/1.0 WARC-Type: response WARC-Record-ID: <urn:uuid:576bee5e-d618-4c18-ab2b-2a6a3eb406b3> WARC-Warcinfo-ID: <urn:uuid:2927591f-7b56-4e62-8682-c4bca829e422> WARC-Concurrent-To: <urn:uuid:6565ab4c-dbe4-4aa3-ab90-9d6bb3d5eb3b> WARC-Target-URI: http://tom-schreuder.hyves.nl/?xmlHttp=1&module=pager&action=showPage&name=ms_lc_lo WARC-Date: 2013-11-19T16:27:40Z WARC-IP-Address: 94.100.127.68 WARC-Block-Digest: sha1:GDFNREUJIRYGUDKRQYFTM5CHIMVVMOAR WARC-Payload-Digest: sha1:RJ2UMQEXH4CW7ORMVW5A2CV6J5YH27FZ Content-Type: application/http;msgtype=response Content-Length: 824 HTTP/1.1 500 Internal Server Error Server: nginx Date: Tue, 19 Nov 2013 16:27:40 GMT Content-Type: text/html; charset=ISO-8859-1 Connection: close Cache-Control: private Expires: 0 Pragma: no-cache Set-Cookie: PHPSESSID=6353536653435356433363468303166373663303231636364316635336269393; path=/; domain=.hyves.nl; HttpOnly X-Powered-By: HPHP Content-Length: 451 <html> <head> <title>Er is een fout opgetreden</title> <link rel="stylesheet" href="http://cache1.hyves-static.net/statics/style20.css" type="text/css"> </head> <body style="padding: 5px;"> <h1 class="SubjectNolink"><i>Boink!</i></h1> <span id="noJsMessage">Er gaat iets niet helemaal goed. Probeer het nog een keer.</span><br /><br /><br />Klik <a href="http://www.hyves.nl/>hier</a> om terug te gaan naar de homepage. </body>
On inspection, it looks like I'm grabbing the URLs from <form action="URL HERE"> which isn't desired.
<form action="URL HERE">
Pager URLs should only be POST'ed with the pager params. Possible solution: reject urls with ms_lc_lo in URL when scraping the fragments.