DaneelM1993 / pywebkitgtk

Automatically exported from code.google.com/p/pywebkitgtk
Other
0 stars 0 forks source link

Should be able to get the html string for WebView? #4

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,
I'm want to get the html string from a WebView,
is there any method like get_html?

thanks.

Original issue reported on code.google.com by jhuangjiahua@gmail.com on 12 Jul 2008 at 7:08

GoogleCodeExporter commented 9 years ago
well, I use this:

{{{
class WebView(webkit.WebView):
    def get_html(self):
        self.execute_script('oldtitle=document.title;
document.title=document.documentElement.innerHTML;')
        html = self.get_main_frame().get_title()
        self.execute_script('document.title=oldtitle;')
        return html
}}}

Original comment by jhuangjiahua@gmail.com on 12 Jul 2008 at 9:02

GoogleCodeExporter commented 9 years ago
jhuangjiahua,

Can you please raise an API request at http://bugs.webkit.org (WebKit Gtk 
component)?
This is a WebKitGtk issue. We'll wrap it once it's ready in WebKitGtk.

n.b. I'm going to add your work-around to an FAQ document. 

I'll close this as WONTFIX. Kindly reopen if you really think we need to handle 
this.

Thanks

Original comment by jmalo...@gmail.com on 18 Mar 2009 at 5:22

GoogleCodeExporter commented 9 years ago
Looking at the way Firefox, Epiphany, Midori and Arora do things (haven't 
looked at
Opera or Kazahakase, &c), it seems the standard way to get the source is to 
grab the
page separately from the rendering widget (even Gecko's view-source 
pseudo-protocol
re-downloads the page). So it looks like libsoup or gnet or some other net lib 
should
be used to retrieve the source (though it might be kind of tricky to get right 
with
session cookies and HTTP POST and so forth). It appears from Apple's docs that
WebView exposes a DOMDocument, which the GTK port doesn't seem to implement 
yet. This
could probably be queried to retrieve the page source, but I've never looked at 
the
implementation to see if it's actually possible--but given that JSCore supports
innerHTML, I'm guessing it would be.

Original comment by MonkeeS...@gmail.com on 23 Apr 2009 at 1:53

GoogleCodeExporter commented 9 years ago
Or, using the code in #28, you can directly access the DOM through a JSContext! 
I
haven't tried it yet, but looks nice.

Original comment by MonkeeS...@gmail.com on 23 Apr 2009 at 1:57

GoogleCodeExporter commented 9 years ago
Using the jswebkit code in bug #28 (now tried it, requires the patch in 
comments for
ucs4 python builds), this can be:

def get_html (self):
    frame = self.view.get_main_frame ()
    ctx = jswebkit.JSContext (frame.get_global_context ())
    text = ctx.EvaluateScript ("document.documentElement.innerHTML")
    return text

Original comment by MonkeeS...@gmail.com on 25 Apr 2009 at 7:06