Closed hughht5 closed 12 years ago
i think the output in gzip, you need to specify text in the headers
Thank alot, that makes perfect sense. As I'm a noob to this though, could you offer me an example of how to decompress the gzipped response? I don't know how to edit the headers.
Thanks again, Hugh
Zombie will now send accept-encoding
header to indicate it does not support gzip.
The simple script below returns a bunch of rubbish. It works for most websites, but not william hill:
var Browser = require("zombie"); var assert = require("assert");
// Load the page from localhost browser = new Browser() browser.visit("http://sports.williamhill.com/bet/en-gb/betting/y/5/et/Football.html", function () { browser.wait(function(){ console.log(browser.html()); }); });
run with node
output: S����J����ꪙRUݒ�kf�6���Efr2�Riz�����^��0�X� ��{�^�a�yp��p�����Ή��`��(���S]-��'N�8q�����/���?�ݻ��u;�݇�ׯ�Eiٲ>��-���3�ۗG�Ee�,��mF���MI��Q�۲������ڊ�ZG��O�J�^S�C~g��JO�緹�Oݎ���P����ET�n;v������v���D�tvJn��J�8'��햷r�v:��m��J��Z�nh�]�� ����Z����.{Z��Ӳl�B'�.¶D�~$n�/��u"�z�����Ni��"Nj��\00_I\00\��S��O�E8{"�m;�h��,o��Q�y��;��a[������c��q�D�띊?��/|?:�;��Z!}��/�wے�h�<�������%������A�K=-a��~' (actual output is much longer)
Anyone know why this happens, and specifically why it happens on the only site i actually want to scrape???
Thanks