cmullaparthi / ibrowse

Erlang HTTP client
Other
516 stars 190 forks source link

set max sessions across subdomains #124

Open varnit opened 9 years ago

varnit commented 9 years ago

Hi, Is it possible to set the max sessions for a domain and have it work across all its subdomains? For example if I set the following:

ibrowse:set_max_sessions("hotmail.com", 443, 100)

I would want a maximum of 100 connections for hotmail.com and all its subdomains (m.hotmail.com, bay01.hotmaill.com etc)

Is this possible today?

cmullaparthi commented 9 years ago

Hi Varnit

Unfortunately no, but is easy enough to do. Will look into it this weekend as I will be refactoring ibrowse a bit to integrate other pull requests.

W: http://chandrusoft.wordpress.com

On 17 Dec 2014, at 23:09, Varnit notifications@github.com wrote:

Hi, Is it possible to set the max sessions for a domain and have it work across all its subdomains? For example if I set the following:

ibrowse:set_max_sessions("hotmail.com", 443, 100) I would want a maximum of 100 connections for hotmail.com and all its subdomains (m.hotmail.com, bay01.hotmaill.com etc)

Is this possible today?

— Reply to this email directly or view it on GitHub.

varnit commented 9 years ago

OK, thanks! Let me know if you need help with anything.

VitoVan commented 8 years ago

@cmullaparthi I assume this has not been done in that weekend?

cmullaparthi commented 8 years ago

I'm afraid not :-) I take it this is important for you?

VitoVan commented 8 years ago

@cmullaparthi Kind of important, forgive my poor English, let me tell a story.

I got a bunch of urls from my boss like this:

http://www.example0.com/foo/bar http://test.example0.com/foo/bar http://foo.example0.com/foo/bar http://bar.example0.com/foo/bar http://www.example1.com/foo/bar http://test.example2.com/foo/bar http://foo.example3.com/foo/bar http://bar.example1.com/foo/bar ...

Then I got a configuration file from my boss like this:

example0.com --> concurrent: 1 bar.example1.com --> concurrent: 2 bar.example2.com --> concurrent: 3

Then when I request the urls above, I need to limit their concurrency by the configuration above.

And the configuration file, in my boss's opinion:

example0.com ofcouse means *.example0.com and example.com.

And I can't tell my boss that ibrowse does not have that kind of configuration, so I have to handle this in my application.

And the other thing is that, the urls my boss give me, is dynamic changing. So I can't tell my boss:"Give me all your urls, and let me generate a appropriate configuration file for you.", I think my boss will reply:"No, programmer, I won't, I'll add url to the list whenever I want, this is easy, handle it".

So, when the my program has been start running, my boss may come to my desk and give me another url, say:"Add it to the list", then I will do as my boss just said.

For now, here is my solution:

  1. I got a url http://test.example0.com/foo/bar, need to be handled
  2. I got a host from the url test.example0.com
  3. I match the host test.example0.com within the configuration file, use ends_with
  4. I matched example0.com --> concurrent: 1
  5. I call :ibrowse.set_max_sessions("test.example0.com", 80, 1)
  6. I think it's done

Then if I got any url like:

http://test.example0.com/foo/bar1 http://test.example0.com/foo/bar2 http://test.example0.com/foo/bar3 http://test.example0.com/foo/bar4

the steps above will be processed again, cause I am so lazy and I didn't write code to store the configurations and then check if the domain is configurated.

Well, end of story.

I not quite sure if it is the right solution, but it seems working.

BUT: I would love to remove the code I have wrote to match subdomains immediately, if ibrowse have this feature.

cmullaparthi commented 8 years ago

I loved this story :-)

There are a couple of complications with this:

Are you happy with both these limitations? If so I will go ahead and implement it.

VitoVan commented 8 years ago

@cmullaparthi Thanks for your reply ~

One or more of your subdomains may be unreachable because there are lots of requests to another subdomain

  • If the unreachable is caused because of the server bandwidth or capability, then it's fine. Since we limit the max_session on the root domain for a reason.
  • If the unreachable is caused because of the retry_later message from ibrowse, then it is also reasonable, it is exactly what we want.

Load balancing will be a more expensive operation because it has to make sure that the limit is enforced while routing requests correctly to each subdomain.

Expensive is a relative word.

Yesterday I refactored my code for better limitation feature, I use poolboy to set a ibrowse pool for every root domain, every time when I get a url, I check if the pool of the root domain of this url exists, if it exists, use the pool, otherwise create a new pool for this root domain.

If what you are going to implement is not more expensive than my approach, I think it worth a try.

Thank you.

cmullaparthi commented 8 years ago

Okay, good. No, the solution will be cheaper than using an external pooling mechanism. I'll create a branch with the proposed changes so you can try.

VitoVan commented 8 years ago

@cmullaparthi Thanks, you are so nice!

cmullaparthi commented 8 years ago

I've pushed some changes to the issue_124 branch. See 3fc7e78aad6ab4b882da4268d17871d1fbc1cc5f

Usage:

$ erl -pa ebin
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.3  (abort with ^G)
1> application:ensure_all_started(ibrowse).
{ok,[ibrowse]}

2>
f(), 
ibrowse:set_max_sessions("google.com", 80, 1), %% Set the LB config for the root domain

Res_1 = ibrowse:send_req("http://www.google.com", [], get, [], 
                         [{use_subdomain_lb_config, {"google.com", 80}}]), %% New option

io:format("Res_1: ~p~n", [Res_1]), 

ibrowse:show_dest_status(), 

Res_2 = ibrowse:send_req("http://m.google.com", [], get, [], 
                         [{use_subdomain_lb_config, {"google.com", 80}}]),  %% New option

io:format("Res_2: ~p~n", [Res_2]), 

ibrowse:show_dest_status().
Res_1: {ok,"302",
           [{"Cache-Control","private"},
            {"Content-Type","text/html; charset=UTF-8"},
            {"Location",
             "http://www.google.co.uk/?gfe_rd=cr&ei=GBZpV-W9IYHS8AeEya-oAg"},
            {"Content-Length","261"},
            {"Date","Tue, 21 Jun 2016 10:25:28 GMT"}],
           "<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF=\"http://www.google.co.uk/?gfe_rd=cr&amp;ei=GBZpV-W9IYHS8AeEya-oAg\">here</A>.\r\n</BODY></HTML>\r\n"}
Server:port                              | ETS   | Num conns  | LB Pid
================================================================================
                       www.google.com:80 | 20500 | 1          | <0.41.0>
                           google.com:80 | 16403 | 0          | <0.41.0>
Res_2: {error,retry_later}
Server:port                              | ETS   | Num conns  | LB Pid
================================================================================
                       www.google.com:80 | 20500 | 1          | <0.41.0>
                           google.com:80 | 16403 | 0          | <0.41.0>
                         m.google.com:80 | 32791 | 0          | <0.41.0>

The same test succeeds if you set max_sessions to 2.

$ erl -pa ebin
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.3  (abort with ^G)
1> application:ensure_all_started(ibrowse).
{ok,[ibrowse]}
2> 
f(), 
ibrowse:set_max_sessions("google.com", 80, 2), 
Res_1 = ibrowse:send_req("http://www.google.com", [], get, [], 
                         [{use_subdomain_lb_config, {"google.com", 80}}]), %% New option

io:format("Res_1: ~p~n", [Res_1]), 

ibrowse:show_dest_status(), 

Res_2 = ibrowse:send_req("http://m.google.com", [], get, [], 
                         [{use_subdomain_lb_config, {"google.com", 80}}]),  %% New option

io:format("Res_2: ~p~n", [Res_2]), 

ibrowse:show_dest_status().
Res_1: {ok,"302",
           [{"Cache-Control","private"},
            {"Content-Type","text/html; charset=UTF-8"},
            {"Location",
             "http://www.google.co.uk/?gfe_rd=cr&ei=dBlpV-mXDpPS8AfI1IFY"},
            {"Content-Length","259"},
            {"Date","Tue, 21 Jun 2016 10:39:48 GMT"}],
           "<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF=\"http://www.google.co.uk/?gfe_rd=cr&amp;ei=dBlpV-mXDpPS8AfI1IFY\">here</A>.\r\n</BODY></HTML>\r\n"}
Server:port                              | ETS   | Num conns  | LB Pid
================================================================================
                       www.google.com:80 | 20500 | 1          | <0.41.0>
                           google.com:80 | 16403 | 0          | <0.41.0>
Res_2: {ok,"302",
           [{"Location","http://www.google.com/mobile/other/"},
            {"Cache-Control","private"},
            {"Content-Type","text/html; charset=UTF-8"},
            {"X-Content-Type-Options","nosniff"},
            {"Date","Tue, 21 Jun 2016 10:39:48 GMT"},
            {"Server","sffe"},
            {"Content-Length","232"},
            {"X-XSS-Protection","1; mode=block"}],
           "<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF=\"http://www.google.com/mobile/other/\">here</A>.\r\n</BODY></HTML>\r\n"}
Server:port                              | ETS   | Num conns  | LB Pid
================================================================================
                       www.google.com:80 | 20500 | 1          | <0.41.0>
                           google.com:80 | 16403 | 0          | <0.41.0>
                         m.google.com:80 | 32791 | 1          | <0.41.0>
VitoVan commented 8 years ago

@cmullaparthi Awesome! Trying...

VitoVan commented 8 years ago

When I use this feature, it seems... well, a little tricky?

  1. Got a limitation like this: "example.com" -> 2
  2. Received a url like this: http://test.example.com
  3. Got the root domain of http://test.example.com, which is example.com
  4. Send the request, with option
ibrowse:send_req("http://test.example.com", [], get, [], 
                         [{use_subdomain_lb_config, {"example.com", 80}}])

Suddenly I realized something, my boss said:"The server example.com is weak, we won't send more than 2 requests at the same time".

When my boss was saying this, the meaning seems include: "I don't know what the port mean, and I don't care what the 443 or 80 or even 8080 mean, they are just webpages, go get them, less than 2 requests at the same time".

At this time, I think maybe it's better to accomplish these demands in my application, instead of ibrowse, what do you think? @cmullaparthi

cmullaparthi commented 8 years ago

Yeah, it's not particularly elegant. But I feel that is the nature of the problem. If you always know that you are going to always shape traffic by using the 1st level subdomain, your code, I suppose, could be simpler using this feature?

invoke_ibrowse(Url, Headers, Payload, Method, Options) ->
    #url{host = Host, port = Port} = ibrowse_lib:parse_url(Url),
    Host_tokens = string:tokens(Host, "."),
    LB_shaping_domain = string:join(lists:nthtail(length(Host_tokens) - 2, Host_tokens, "."),
    ibrowse:send_req(Url, Headers, Method, Payload, [{use_subdomain_lb_config, {LB_shaping_domain, Port}} | Options]).

I suppose the above is more bearable than having to maintain your own pooling mechanism?