andrewvy / chrome-remote-interface

Elixir Client for the Chrome Debugger Protocol
https://hexdocs.pm/chrome_remote_interface
66 stars 31 forks source link

Page closing doesn't appear to work #17

Closed bcardarella closed 7 years ago

bcardarella commented 7 years ago

Steps to reproduce:

ChromeLauncher.launch()
server = ChromeRemoteInterface.Session.new()
{:ok, page} = ChromeRemoteInterface.Session.new_page(server)

At this point you can navigate to http://localhost:9222 and confirm that there are two pages

ChromeRemoteInterface.Session.close_page(server, page["id"])

This last command returns false and by refreshing the Headless page in your browser you can confirm that there are still two pages open.

This is running against Canary. (Chrome 63)

In Chrome Beta (Chrome 62) for close_page/2 I get:

{:error, {:invalid, "T", 0}}

However, the page does close.

I cannot find documentation for the JSON API that this library is using to issue the close command. Maybe the API changed between 62 and 63?

bcardarella commented 7 years ago

Looking at the js chrome-remote-interface they appear to be doing the exact same call: https://github.com/cyrus-and/chrome-remote-interface/blob/master/lib/devtools.js#L79-L88

Still digging, could be just a canary bug

bcardarella commented 7 years ago

From what I can tell, the actual response body shouldn't matter when closing a page. The status code will be 200 if the page was closed successfully or 404 if the page was not found.

The API does appear to account for this: https://github.com/andrewvy/chrome-remote-interface/blob/master/lib/http.ex#L29 but the body response for close isn't a JSON document and will then fail.

bcardarella commented 7 years ago

More info: I believe that hackney is the culprit here. With Chrome 63 the URL for the page ids is now surrounded with ( )s. hackney normalizes this path. So something like:

http://localhost:9222/json/close/(174408FF499E3B6F16497ED20614E1D2)

will get normalized to:

http://localhost:9222/json/close/%28174408FF499E3B6F16497ED20614E1D2%29

This results in a 404 as a page of that ID is never found.

technically I believe that hackney is doing the right thing, but Chrome doesn't deserialize the ID value internally so it cannot be found.

I'll open a bug in Chromium for this, but at the moment Canary won't work for any requests using a page ID.

bcardarella commented 7 years ago

For tracking purposes: https://bugs.chromium.org/p/chromium/issues/detail?id=778263

bcardarella commented 7 years ago

I've opened a 2nd issue to track the response payload from closing a page: https://bugs.chromium.org/p/chromium/issues/detail?id=778286

The current response body is plaintext but the content-type is application/json. If the response type was corrected to text/plain or a serialized JSON body was returned we could properly decode with Poison.decode but as it stands that is not the case.

I will make a PR that can include a temporary fix to get around the error state, but it would be better for this to be properly returned upstream so the library can handle

andrewvy commented 7 years ago

A fix for the handling of successful non-json responses has been merged. and released under the 0.0.5 tag. 😄

Keeping this ticket open to track the upstream issue for >= Chrome 63.

andrewvy commented 7 years ago

A thing I did notice is that you can pass in an option for custom URL path encoding: https://github.com/benoitc/hackney/blob/master/src/hackney.erl#L285

:path_encode_fun

A simple passthrough identity fun here would solve the issue I believe. 🤔

bcardarella commented 7 years ago

ah I completely missed this. Yes, I'll try it out and PR if it works

andrewvy commented 7 years ago

This is fixed with https://github.com/andrewvy/chrome-remote-interface/pull/19 and v0.0.6 tagged. Stellar help with tracking and fixing these bugs. 👍