Tjatse / req-fast

Fastest way to fetch the web content(HTML stream) from server, supports:redirects, auto decode(e.g.:Chinese), gzip, cookie, proxy...
Apache License 2.0
34 stars 6 forks source link

Fails for raw deflate encoded response #3

Closed entertainyou closed 8 years ago

entertainyou commented 8 years ago

url: http://mp.weixin.qq.com/s?__biz=MjM5MzIyMDExOQ==&mid=400941252&idx=1&sn=0d98926515101df82e552720e93d6f6a&scene=2&srcid=11298yEW5zEhufnUxomV561q&from=timeline&isappinstalled=0#wechat_redirect

The server has "Content-Encoding: deflate" response header, and the response is raw deflate encoded content.

Browsers can parse the url correctly(Firefox&Chromium).

Checked the code, it's not an easy change to recover from inflate error(when error, we can try inflate using zlib.createInflateRaw) or maybe we can remove the deflate part in accept encoding request header(request library only sends "gzip").

Tjatse commented 8 years ago

Thanks a lot, I've noticed this days ago,but haven't figured it out with an elegant way to automatic unzip gzip contents, now it's fixed in 12cfa8c and coverage with full tests.

it('auto handled @entertainyou', function(done){
  req('http://mp.weixin.qq.com/s?__biz=MjM5MzIyMDExOQ==&mid=400941252&idx=1&sn=0d98926515101df82e552720e93d6f6a&scene=2&srcid=11298yEW5zEhufnUxomV561q&from=timeline&isappinstalled=0#wechat_redirect', function(err, resp){
    should.not.exist(err);
    should.exist(resp.body);
    expect(resp.body).to.be.a('string');
    done();
  });
});

benchmark:

node benchmark/elapsed_time.js
A sample of 1000 cases:
request x 2.345 ms (+1307.25%, -57.36%).
req-fast x 2.009 ms (+895.52%, -100.00%).
request x 2.449 ms (+879.99%, -59.17%).
req-fast x 1.99 ms (+754.27%, -100.00%).
node benchmark/memory_usage.js
A sample of 1000 cases:
request x 31576.064 bytes (+7553.39%, -2655.45%).
req-fast x 11227.136 bytes (+43752.61%, -37020.83%).
request x 30715.904 bytes (+7741.05%, -2820.36%).
req-fast x 11337.728 bytes (+43975.14%, -45945.38%).
entertainyou commented 8 years ago

@Tjatse , thanks for the quick fix, how to interprete the benchmark results(There are two row of req-fast, and what's the number inside the parenthsis)?

A sample of 1000 cases:
request x 31576.064 bytes (+7553.39%, -2655.45%).
req-fast x 11227.136 bytes (+43752.61%, -37020.83%).
request x 30715.904 bytes (+7741.05%, -2820.36%).
req-fast x 11337.728 bytes (+43975.14%, -45945.38%).
Tjatse commented 8 years ago

I've just tested it twice :) and the numbers in parenthesis obey the following format:

[Module Name] x [AVG] [Unit] (+[(MAX / AVG - 1) * 100]%, -[(1 - MIN / AVG) * 100]%).'

How to run benchmark:

Run local server

node benchmark/server.js 

Elapsed Time

node benchmark/elapsed_time.js 

Memory Usage

node benchmark/memory_usage.js 
Tjatse commented 8 years ago

I am using this module to grab content from internet only, it works great, and like what I said:

GC effects these a lot, and I do not believe the result of process.memoryUsage().rss, request should performances better.

I am using request in lots of other situations, like post / upload data, it is fabulous.