andrewvy / chrome-remote-interface

Elixir Client for the Chrome Debugger Protocol
https://hexdocs.pm/chrome_remote_interface
66 stars 30 forks source link

CRI seems to be working differently outside of iex shell #35

Closed pedroseabra1091 closed 4 years ago

pedroseabra1091 commented 4 years ago

So I am using Chroxy and ChromeRemoteInterface to fetch and further parse HTML with Floki. If I get the outer HTML inside the iex I am able to fetch and then find the elements I desire, however, outside of iex, Floki is not able to find anything.

Here it goes a sample of the code I currently have:

ws_addr = Chroxy.connection()
{:ok, page} = ChromeRemoteInterface.PageSession.start_link(ws_addr)
ChromeRemoteInterface.RPC.Page.enable(page)
ChromeRemoteInterface.PageSession.subscribe(page, "Page.loadEventFired", self())
ChromeRemoteInterface.RPC.Page.navigate(page, %{url: url})
{:ok, dom} = ChromeRemoteInterface.RPC.DOM.getDocument(page)
nodeId  = dom["result"]["root"]["backendNodeId"]
{:ok, %{"result" => result}} = ChromeRemoteInterface.RPC.DOM.getOuterHTML(page, %{backendNodeId: nodeId})
pre_selected_content = Floki.find(result["outerHTML"], "div.productBoxTop")

Any suggestions?

andrewvy commented 4 years ago

One thing is that ChromeRemoteInterface.PageSession.subscribe(page, "Page.loadEventFired", self()) subscribes to the events and forwards it to the subscribed process, but doesn't block. (This library is fairly bare-bones and low-level, I know it isn't ideal, I'd like to add a better option for synchronous execution.. :cry: )

You can add a receive block right after navigation, that listens for this Page.loadEventFired event.

receive do
  {:chrome_remote_interface, "Page.loadEventFired", _} -> :ok
after
  10_000 -> {:error, :timeout}
end

Synchronous API discussion is at https://github.com/andrewvy/chrome-remote-interface/issues/11, I'd like to hear your thoughts, if you have any :)

pedroseabra1091 commented 4 years ago

Thanks for the help! Unfortunately, I don't have any suggestion 😞 However, I do think this information would be helpful in README 😛

andrewvy commented 4 years ago

I agree! There's really only a little bit of documentation in https://hexdocs.pm/chrome_remote_interface/ChromeRemoteInterface.PageSession.html#subscribe/3. Not a pleasant API to work in right now, apologies.

If it's okay with you, I'm going to mark this closed for now. :+1: