arnoudbuzing / webtools

A Wolfram Language package which automates interactions with web browsers
Other
33 stars 6 forks source link

Get specified HTML element #5

Closed wjxway closed 6 years ago

wjxway commented 6 years ago

It would be great if I can write something like:

GetPageHTML[Id[*******]] or GetPageHTML[XPath[*******]]

Now I'll have to do StringCases[GetPageHTML[****],***] which is kind of redundant and of low efficiency.

Also, WebUnit crash sometimes when opening webpage, returning the following info: "value" /. ImportString[$Failed, "JSON"], and don't work afterward even if I change webpage manually. How can I fix this?

BTW: I LOVE WebUnit, It's so helpful when dealing with web stuffs, thanks a lot~

arnoudbuzing commented 6 years ago

That should be easy, I'm looking at it now.

Always interested in hearing how people use it.

arnoudbuzing commented 6 years ago

Quick solution, use JavascriptExecute (change 'id' to what you need):

JavascriptExecute["return document.getElementById('id').innerHTML;"]

Or, as a Wolfram Language function:

GetHtmlForId[id_String] := JavascriptExecute[ 
  "return document.getElementById('" <> id <> "').innerHTML;"]
wjxway commented 6 years ago

Thanks a lot! JavascriptExecute is really powerful! BTW, I checked your source code, it seems that most of the existing functions like ClickElement are actually done by calling JavascriptExecute, am I right?

arnoudbuzing commented 6 years ago

Yes: The WebDriver protocol provides the basics (launching a browser, opening web pages, navigating pages, clicking and typing, etc.). But there are many common and useful things one might want to to, so building higher level Wolfram Language functions based on WebDriver functions + Javascript is a good direction to extend this package.

wjxway commented 6 years ago

Some additional methods for getting HTMLs:

Options[GetHTML]={"Selection"->"outer"}; GetHtml[Selector[sel_String],OptionsPattern[]] := JavascriptExecute[ "return document.querySelector('" <> sel <> "')."<>OptionValue["Selection"]<>"HTML;"] GetHTML[XPath[xp_String],OptionsPattern[]] := JavascriptExecute["document.evaluate('"<>xp<>"', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue."<>OptionValue["Selection"]<>"HTML;"]

and your GetHtmlForId: GetHtml[Id[id_String],OptionsPattern[]] := JavascriptExecute["return document.getElementById('" <> id <> "')."<>OptionValue["Selection"]<>"HTML;"]

wjxway commented 6 years ago

Also, here's a few frequently used functions when I'm using WebUnit: GetPageURL[]:=JavascriptExecute["return window.location.href;"] OffAlert[]:=JavascriptExecute[ "window.alert=function(){return 1}; window.confirm=function(){return 1}; window.prompt=function(){return 1};"]

arnoudbuzing commented 6 years ago

I understand what GetPageURL is useful for. But can you explain what Off/OnAlert is for?

wjxway commented 6 years ago

Errr, I modified the code a bit, It seems that the old version didn't work out..... OffAlert is used to turn off all alerts or confirmation windows created by alert(), confirm() and prompt(). Once these alerts are on, it's impossible(?) to control the web page anymore without a human clicking "confirm" button and turn the alert off, which is annoying......