aerokube / ggr

A lightweight load balancer used to create big Selenium clusters
https://aerokube.com/ggr/latest/
Apache License 2.0
314 stars 74 forks source link

[PROXY_ERROR] [guest] [dial tcp: lookup selenium-grid-1 on x.x.x.x:53: write udp y.y.y.y:53924->x.x.x.x:53: write: invalid argument] #312

Closed cod-r closed 4 years ago

cod-r commented 4 years ago

Hi,

I'm testing ggr with two selenium grids (hubs), each grid has 10 nodes (chrome browsers). When sending 20 new session requests at once I get random errors like:

2020/05/13 13:56:22 [608] [-] [PROXY_ERROR] [guest] [z.z.z.z] [http://selenium-grid-1:4444/wd/hub/session/b29fcde40b740853fae3f972c4867fda/url] [-] [-] [-] [dial tcp: lookup selenium-grid-1 on x.x.x.x:53: write udp y.y.y.y:59203->x.x.x.x:53: write: invalid argument]
2020/05/13 13:56:22 [609] [-] [SESSION_DELETED] [-] [z.z.z.z] [-] [selenium-grid-1:4444] [b29fcde40b740853fae3f972c4867fda] [-] [-]
2020/05/13 13:56:22 [609] [-] [PROXY_ERROR] [guest] [z.z.z.z] [http://selenium-grid-1:4444/wd/hub/session/b29fcde40b740853fae3f972c4867fda] [-] [-] [-] [dial tcp: lookup selenium-grid-1 on x.x.x.x:53: write udp y.y.y.y:59060->x.x.x.x:53: write: invalid argument]

When the error happens for one session there is always these 3 logs for it in ggr^^^ And the following log appears in the selenium grid:

5/13/2020 13:57:40.330 WARN [SessionCleanup.null] - session ext. key b29fcde40b740853fae3f972c4867fda has TIMED OUT due to client inactivity and will be released.

This means the session is created but never used so a timeout occurs in the grid. Notice the same session id.

The following error arrives in my test:

5/13/2020 4:25:09 PM org.openqa.selenium.json.JsonException: Expected to read a START_MAP but instead have: END. Last 0 characters read:
5/13/2020 4:25:09 PM Build info: version: 'unknown', revision: 'unknown', time: 'unknown'
5/13/2020 4:25:09 PM System info: host: '8111d1230e5c', ip: 'x.x.x.x', os.name: 'Linux', os.arch: 'amd64', os.version: '4.19.0-9-amd64', java.version: '11.0.6'
5/13/2020 4:25:09 PM Driver info: driver.version: RemoteWebDriver
5/13/2020 4:25:09 PM    at org.openqa.selenium.json.JsonInput.expect(JsonInput.java:290) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.json.JsonInput.beginObject(JsonInput.java:220) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.json.MapCoercer.lambda$apply$1(MapCoercer.java:64) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.json.JsonTypeCoercer.lambda$null$6(JsonTypeCoercer.java:145) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.json.JsonTypeCoercer.coerce(JsonTypeCoercer.java:126) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.json.Json.toType(Json.java:69) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.json.Json.toType(Json.java:55) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.json.Json.toType(Json.java:50) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:87) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:49) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:158) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:609) ~[selenium-remote-driver-3.141.59.jar!/:na]
5/13/2020 4:25:09 PM    at org.openqa.selenium.remote.RemoteWebDriver.getCurrentUrl(RemoteWebDriver.java:287) ~[selenium-remote-driver-3.141.59.jar!/:na]
...

I've been using the selenium grid for a long time and it's the first time I'm seeing this type of error org.openqa.selenium.json.JsonException which happend only with ggr. So this might be a good clue for debugging. Note: I replaced the original IPs with x and y

Expected behaviour:

Start 10 browser sessions on each grid without errors.

Actual behaviour:

Random number of sessions have errors. Sometimes 3, sometimes 10, rarely none.

This is my quota:

<qa:browsers xmlns:qa="urn:config.gridrouter.qatools.ru">
<browser name="chrome" defaultVersion="79.0.3945.130">
    <version number="79.0.3945.130">
        <region name="1">
            <host name="selenium-grid-1" port="4444" count="1"/>
            <host name="selenium-grid-2" port="4444" count="1"/>
        </region>
    </version>
</browser>
</qa:browsers>

Grid settings (same on both grids):

{
  "browserTimeout": 120,
  "capabilityMatcher": "org.openqa.grid.internal.utils.DefaultCapabilityMatcher",
  "cleanUpCycle": 5000,
  "custom": {
  },
  "debug": false,
  "host": "x.x.x.x",
  "jettyMaxThreads": -1,
  "newSessionRequestCount": 0,
  "newSessionWaitTimeout": -1,
  "port": 4444,
  "registry": "org.openqa.grid.internal.DefaultGridRegistry",
  "role": "hub",
  "servlets": [
  ],
  "slotCounts": {
    "free": 10,
    "total": 10
  },
  "success": true,
  "throwOnCapabilityNotPresent": true,
  "timeout": 90,
  "withoutServlets": [
  ]
}

The grids, nodes and ggr are running on very powerful servers so there is no problem of enough resources.

vania-pooh commented 4 years ago

@cod-r selenium-grid-1 should be a valid hostname where Selenium Hub is running.

cod-r commented 4 years ago

As I already pointed out, some of the sessions are working, sometimes I get no error at all and all 20 tests run without errors but most of the time I encounter these [PROXY_ERROR] for some of the tests (never for all of them).

cod-r commented 4 years ago

If there is something that I can do to further debug this problem please tell me as I already have everything in place and it's easy to reproduce.

vania-pooh commented 4 years ago

@cod-r I would check your DNS settings (/etc/resolv.conf), e.g. whether you are using some caching DNS server.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.