elixir-wallaby / wallaby

Concurrent browser tests for your Elixir web apps.
https://twitter.com/elixir_wallaby
MIT License
1.65k stars 196 forks source link

(RuntimeError) Wallaby had an internal issue with HTTPoison #365

Closed stephaniewilkinson closed 2 years ago

stephaniewilkinson commented 6 years ago

Issue

When I switch to using Chrome, I get this error:

     ** (RuntimeError) Wallaby had an internal issue with HTTPoison
     stacktrace:
       (wallaby) lib/wallaby/httpclient.ex:32: Wallaby.HTTPClient.make_request/4
       (wallaby) lib/wallaby/experimental/chrome.ex:64: Wallaby.Experimental.Chrome.start_session/1
       (wallaby) lib/wallaby.ex:80: Wallaby.start_session/1
       (school) test/support/feature_case.ex:25: SchoolWeb.FeatureCase.__ex_unit_setup_0/1
       (school) test/support/feature_case.ex:1: SchoolWeb.FeatureCase.__ex_unit__/2
       test/school_web/features/view_test.exs:1: SchoolWeb.UserListTest.__ex_unit__/2

Versions: elixir: "~> 1.4",

Test Code & HTML

Here's my test:

defmodule SchoolWeb.UserListTest do
  use SchoolWeb.FeatureCase, async: true

  import Wallaby.Query, only: [css: 2]

  test "homepage is accessible", %{session: session} do
    session
    |> visit("/")
    |> assert_has(css(".header", count: 1, text: "Welcome"))
  end
end

Here's my FeatureCase:

defmodule SchoolWeb.FeatureCase do
  use ExUnit.CaseTemplate

  using do
    quote do
      use Wallaby.DSL

      alias School.Repo
      import Ecto
      import Ecto.Changeset
      import Ecto.Query

      import SchoolWeb.Router.Helpers
    end
  end

  setup tags do
    :ok = Ecto.Adapters.SQL.Sandbox.checkout(School.Repo)

    unless tags[:async] do
      Ecto.Adapters.SQL.Sandbox.mode(School.Repo, {:shared, self()})
    end

    metadata = Phoenix.Ecto.SQL.Sandbox.metadata_for(School.Repo, self())
    {:ok, session} = Wallaby.start_session(metadata: metadata)
    {:ok, session: session}
  end
end

Here's my HTML:

<h1 class="header center">Welcome to School!</h1> Thanks for any help you can give!

keathley commented 6 years ago

Thanks for the report @stephaniewilkinson. Can you confirm what version of chrome and chromedriver you have installed?

stephaniewilkinson commented 6 years ago

My Chrome is Version 66.0.3359.117 (Official Build) (64-bit) My ChromeDriver is 2.37.544337 (8c0344a12e552148c185f7d5117db1f28d6c9e85) I had previously tried on Chrome 65 with the same result.

For phantomjs I didn't customize the version, just ran this:

npm install -g phantomjs-prebuilt

I should probably mention the tests run fine before I try to switch to chrome.

keathley commented 6 years ago

Thanks! All of that looks correct to me. We get those httppoison errors when there’s an issue communicating with chromedriver. Based on the stack trace it looks like it’s not able to create a new session which typically indicates an issue with wallabys ability to talk to chromedriver or chromedrivers ability to talk to chrome.

stephaniewilkinson commented 6 years ago

Thanks! any ideas what to try next?

keathley commented 6 years ago

What OS are you using? Are both chrome and chromedriver in your path? Other things to try might be trying out the latest elixir and otp version.

dsdshcym commented 6 years ago

Got an similar error:

     ** (RuntimeError) Wallaby had an internal issue with HTTPoison
     code: |> assert_has(css("td", text: node_id_1))
     stacktrace:
       (wallaby) lib/wallaby/httpclient.ex:32: Wallaby.HTTPClient.make_request/4
       (wallaby) lib/wallaby/experimental/selenium/webdriver_client.ex:125: Wallaby.Experimental.Selenium.WebdriverClient.text/1
       (wallaby) lib/wallaby/driver/log_checker.ex:6: Wallaby.Driver.LogChecker.check_logs!/2
       (wallaby) lib/wallaby/browser.ex:910: Wallaby.Browser.matching_text?/2
       (elixir) lib/enum.ex:2872: Enum.filter_list/2
       (elixir) lib/enum.ex:2873: Enum.filter_list/2
       (wallaby) lib/wallaby/browser.ex:903: Wallaby.Browser.validate_text/2
       (wallaby) lib/wallaby/browser.ex:925: anonymous fn/3 in Wallaby.Browser.execute_query/2
       (wallaby) lib/wallaby/br
                       _build/test/lib/wallaby/priv/run_command.sh: line 7: 85966 Segmentation fault: 11  "$@"

Not sure if it's the same issue though.

I'm running macOS 10.13.5, Chrome 67.0.3396.79, ChromeDriver 2.40.565386, Elixir 1.6.5 (compiled with OTP 20).

keathley commented 6 years ago

We need to add more backoff and jitter to our http client. We see these issues when chromedriver starts to get overwhelmed with requests.

calvinkw1 commented 6 years ago

@keathley any update here? Am running into the same issue:

** (RuntimeError) Wallaby had an internal issue with HTTPoison
     stacktrace:
       (wallaby) lib/wallaby/httpclient.ex:32: Wallaby.HTTPClient.make_request/4
       (wallaby) lib/wallaby/experimental/chrome.ex:64: Wallaby.Experimental.Chrome.start_session/1
       (wallaby) lib/wallaby.ex:80: Wallaby.start_session/1
       (orderbook) test/support/browser_feature_case.ex:17: Test.Support.Browser.Feature.Case.__ex_unit_setup_all_0/1
       (orderbook) test/support/browser_feature_case.ex:1: Test.Support.Browser.Feature.Case.__ex_unit__/2
       test/redacted/features/browser/redacted.exs:1: Redacted.Features.Browser.Redacted.__ex_unit__/2

I have chromedriver 2.36.540469 installed, Elixir 1.6.4 (compiled with OTP 20), Mac OS 10.13.5

keathley commented 6 years ago

I'm working on tracking down the root cause but so far no luck. Without a way to reliably re-produce this issue locally its very difficult to ensure we have a proper fix. If you have a test case that you can provide that would greatly assist the effort here.

keathley commented 6 years ago

I've opened a PR that may help to reduce some of the issues that folks are seeing. If you would like to test out that branch locally that would be very useful.

keathley commented 6 years ago

The PR is now on master so if you'd like to test with that it would be very appreciated.

calvinkw1 commented 6 years ago

Sorry.. didn't work. Getting a similar error:

 1) Redacted.Features.Browser.AddressTests: failure on setup_all callback, test invalidated
     ** (RuntimeError) Wallaby had an internal issue with HTTPoison:
     %HTTPoison.Error{id: nil, reason: :econnrefused}
     stacktrace:
       (wallaby) lib/wallaby/httpclient.ex:37: Wallaby.HTTPClient.make_request/5
       (wallaby) lib/wallaby/experimental/chrome.ex:72: Wallaby.Experimental.Chrome.start_session/1
       (wallaby) lib/wallaby.ex:80: Wallaby.start_session/1
       ...

This seems to be only for me.. works fine on Circle CI and on other colleagues' machines. Not sure why this is.

I can provide whatever you need but note that I'm an Elixir newb...

evax commented 6 years ago

I've been having a similar issue and realized that a lot of chrome and chromedriver processes had been accumulating - killing those fixes it for me.

keathley commented 6 years ago

@evax Like they're being orphaned after running tests?

@calvinkw1 Can you provide any other insights into what might be different on your machine? Different OS, chromedriver version, chrome version, etc.? Based on the error you posted it looks like chromedriver isn't even starting, hence the :econnrefused. Are you sure that chromedriver is in your path? If we can narrow down why thats happening then we might be able to provide a more meaningful error.

evax commented 6 years ago

@keathley yes, though in that case if that helps I wasn't running the full test suite but giving mix test the path of the test file as an argument, in an umbrella.

keathley commented 6 years ago

I wonder if the cleanup script isn't being run correctly because of the umbrella or something similar.

evax commented 6 years ago

It could be because the path is relative to the app, so when you have multiple apps in an umbrella, it will likely only work for one app, the others display: Test patterns did not match any file: path/to/testfile_test.exs

calvinkw1 commented 6 years ago

Ah shoot. This is probably something I should have mentioned as well. I was passing in a specific file path rather than running the entire test suite. Running the entire test suite passes without issue.

Mac OS High Sierra 10.13.5

Google Chrome is up to date
Version 67.0.3396.99 (Official Build) (64-bit)

chromedriver --version
ChromeDriver 2.36.540469 (1881fd7f8641508feb5166b7cae561d87723cfa8)

Watching htop, I don't see a chromedriver process run when I only pass in a file path, but I do when I run the full suite.

Also, if it matters, I work on a Ruby project that has ChromeDriver 2.40.565386 (45a059dc425e08165f9a10324bd1380cc13ca363) installed.

keathley commented 6 years ago

@calvinkw1 The only thing I can think in that case would be that its not running the test_helper.exs when you run that specific file. Or that file isn't using the correct FeatureCase or something similar.

mjankowski commented 5 years ago

After using chromedriver+wallaby successfully on an elixir test suite for a while, I saw this issue for first time today. I remembered that last week while working on other projects I had fiddled around with my chromedriver install - I had a few ruby projects and wound up using the chromedriver-helper gem in them to get local and CI working well together.

Anyway - I have the same chromedriver version installed (ChromeDriver 2.42.591059 (a3d9684d10d61aa0c45f6723b327283be1ebaad8)) via both homebrew and as an rbenv shim from the gem installs.

When I manipulate my path to get the chromedriver installed via homebrew as the chromedriver that wallaby uses, everything is fine. When I update path to use the rbenv shim, I see the "Wallaby had an internal issue with HTTPoison" sort of errors described here.

I'm not sure what if anything to do next to help debug what these installs are doing differently - but maybe this data point helps someone. Happy to do more if I can be useful.

keathley commented 5 years ago

Thanks @mjankowski. It seems like something has changed recently and has broken our interactions with chromedriver.

sparta-developers commented 5 years ago

We also see

** (RuntimeError) Wallaby had an internal issue with HTTPoison:
%HTTPoison.Error{id: nil, reason: :econnrefused}

when running Wallaby tests in our two apps. In one app, we can set the test concurrency to 2 and the error goes away. In the other app, we always get the error unless we slow things down by increasing the jitter by forking Wallaby and by passing in our own implementation of create_session that sleeps for 500ms inside a mutex:

def create_session(base_url, capabilities) do
  Mutex.under(:chrome_session_mutex, :start_session, :infinity, fn ->
    :timer.sleep(500)

    Wallaby.Experimental.Selenium.WebdriverClient.create_session(base_url, capabilities)
  end)
end

We're using chromedriver 74.0.3729.6 installed via homebrew on MacOS.

Are other people having success with chromedriver?

keathley commented 5 years ago

I suspect that chromedriver just can't keep up with the amount of traffic we're sending it at the same time. We could try to force this to behave better by setting limits on the hackney pool so that we don't overwhelm chromedriver as often.

keathley commented 5 years ago

I haven't tried this but I wonder if the new puppet api thing that google has would allow for more performance and be more reliable then the current chromedriver setup.

sparta-developers commented 5 years ago

We could try to force this to behave better by setting limits on the hackney pool so that we don't overwhelm chromedriver as often.

@keathley we tried setting hackney's pool size to 1, which unfortunately did not help. (It's possible that we're doing it wrong, but we checked a few different ways that it was using our pool size.)

We have learned that we're getting the "connection refused" messages because chromedriver crashes but we haven't yet found anything useful in its logs.

I haven't tried this but I wonder if the new puppet api thing that google has would allow for more performance and be more reliable then the current chromedriver setup.

Do you mean Puppeteer? We see that there are a few issues (#417, #358) about that already. One knock against it was that it's Chrome-only, but maybe it's worth doing anyway?

keathley commented 5 years ago

I'd be in favor of moving to puppeteer for chrome only and using webdriver for selenium. We don't have a ton of issues with selenium handling the webdriver stuff.

AndroidOatmeal commented 5 years ago

Any updates on this issue? This problem unfortunately causes my test suite to fail about 50% of the time right now.

michallepicki commented 5 years ago

@AndroidOatmeal Could you provide more background, or are your stacktraces exactly the same as in the issue description? Does the provided example reproduce the issue with similar rate, or maybe you could provide a different minimal example? What are your Elixir, Chrome and Chromedriver versions?

AndroidOatmeal commented 5 years ago

Sure, here are my versions

Elixir 1.8.1 (compiled with Erlang/OTP 20)
ChromeDriver 74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29})
Chrome Version 75.0.3770.100 (Official Build) (64-bit)

My errors are the same as described above. They seem to happen intermittently. Our acceptance test suite has 47 tests currently.


     ** (RuntimeError) Wallaby had an internal issue with HTTPoison:
     %HTTPoison.Error{id: nil, reason: :econnrefused}
     stacktrace:
       (wallaby) lib/wallaby/httpclient.ex:37: Wallaby.HTTPClient.make_request/5
       (wallaby) lib/wallaby/experimental/chrome.ex:79: Wallaby.Experimental.Chrome.start_session/1
       (wallaby) lib/wallaby.ex:80: Wallaby.start_session/1```
AndroidOatmeal commented 5 years ago

@keathley @michallepicki @sparta-developers Do you have any suggestions for debugging this issue? I'm not afraid to get my hands dirty to figure this out, I just don't know where in the Wallaby codebase to begin.

kerryb commented 5 years ago

I'm also seeing this error. It only happens when I run tests with async: true (I'm trying to migrate away from Cabbage, which only worked with async: false). Even with just two tests running concurrently, I get the error almost every time.

Also happy to help debug, if you point me in the right direction.

keathley commented 5 years ago

I'd start by trying to trace the calls to HTTPClient and the Wallaby.Experimental.Chrome.Chromedriver process. That's the process that controls chromedriver through a port. My intuition is that we're causing chromedriver to crash or something similar.

@kerryb I think the reason that you're seeing issues when running in async: true mode is either 1) we're overwhelming chromedriver with the number of requests that we're sending it or 2) inducing some internal bug with chromedriver because it can't handle the number of independent sessions we're using. In either case we're overwhelming chromedriver but my money is on 1 being the root cause.

An easy way to test this is to limit the number of http requests we allow through at any given time. Because we're using hackney (through httpoison) a quick way to test this out would be to update the HTTPClient api to use a dedicated hackney pool and set the pool limits arbitrarily low like so:

# Add this to out supervision tree
:hackney_pool.child_spec(:wallaby_pool, [timeout: 15000, max_connections: 4])

# Update http client to use hackney pool on line 41 of http_client.ex.
HTTPoison.request(url, body, headers(), hackney: [pool: :wallaby_pool])

That's where I would start debugging.

mhanberg commented 5 years ago

@keathley would it be beneficial to actually surface HTTPoison errors to the user?

Why did you decide to swallow them in a wallaby error?

keathley commented 5 years ago

@mhanberg Yeah we should totally be surfacing them. I'm not sure I had a reason for not surfacing them. Either way its something we should change now.

eahanson commented 5 years ago

@mhanberg the Wallaby code does try to surface the HTTPoison errors, and it works in my experience:

https://github.com/keathley/wallaby/blob/6333301eb41dd63c03cf5ac2370087808c41cd8c/test/wallaby/http_client_test.exs#L75

When I experience this problem (which is always), it's because HTTPoison gets :econnrefused which (again in my experience) is because the chromedriver process has crashed. It crashes pretty much any time I send it a lot of requests, like when I'm running tests in parallel.

keathley commented 5 years ago

@eahanson Yup that tracks with what I've seen in the past. I think we can try to limit the amount of in flight requests to chromedriver which should help it stay healthy. We might need to work out a better supervision strategy for our process that manages chromedrivers as well. We might have to fail a test case if a chromedriver crashes in the middle of it but we shouldn't need to fail the entire suite.

mhanberg commented 5 years ago

@eahanson I am seeing that support for surfacing the error was included a few months after this issue was opened.

@keathley @michallepicki I wonder if we should close this issue in favor of an issue regarding overwhelming chromedriver. What do you think?

mhanberg commented 5 years ago

Also, to address the references to Puppeteer above:

Puppeteer is a Node.js library primarily made to write web scraping scripts.

michallepicki commented 5 years ago

@mhanberg I think it would be possible to write a Wallaby driver using the Chrome Devtools Protocol through puppeteer... For example there's taiko which is a testing tool that uses it directly.

keathley commented 5 years ago

Like @michallepicki said, I think Wallaby could support the chrome devtools protocol for its chrome support. I think that's a good future direction personally but let's set that aside for a moment and see if we can fix the current issue. For now lets try to limit the number of requests hitting chromedriver simultaneously and see if that helps keep chromedriver healthy. If that solves the immediate problem we can close this ticket and focus on moving to the chrome devtools protocol as a more future proof step.

mhanberg commented 5 years ago

@michallepicki My point being we wouldn't use Puppeteer, we would write a driver for the Chrome DevTools protocol. I am being pedantic but I want to clarify what I was saying.

mhanberg commented 5 years ago

FYI last night I started working on the supervision and http request strategy for chromedriver.

ccarvalho-eng commented 5 years ago

@mhanberg Was this fixed with the last release? (https://github.com/keathley/wallaby/commit/231230585fefece65be1ca4f7760a6c893543810)

mhanberg commented 5 years ago

@wood-archer Not explicitly, but it might occur less frequently now.

Do you mind upgrading and testing it out?

ccarvalho-eng commented 5 years ago

Sure. will give it a try. Thanks!

ccarvalho-eng commented 5 years ago

@mhanberg sorry for the long time to post back but it looks like the error Wallaby had an internal issue with HTTPoison is gone from my tests after updating from wallaby 0.20.0 to 0.23.0. Now I can only see errors like ** (Wallaby.QueryError) Expected to find 1, visible element that matched the css '# .. but most likely these are just on my end.

AndroidOatmeal commented 5 years ago

For reference, I am still experiencing this issue often, even on 0.23.0. :(

FYI last night I started working on the supervision and http request strategy for chromedriver.

@mhanberg Any luck with on this? Do you need any help with it?

mhanberg commented 5 years ago

@AndroidOatmeal I think I had some luck, but I (obviously) got distracted from this for some reason.

I'll push a branch tonight and then you can try out that branch.

Sorry about that!

mhanberg commented 5 years ago

I have opened a PR that will hopefully help address this. #463

@AndroidOatmeal would you mind trying out that branch? Thanks!

mhanberg commented 5 years ago

I merged #463. I will be able to cut a release tonight.

If you have run into this problem, please update and let me know if it fixes it for you!