Closed ariya closed 6 years ago
ariya.hi...@gmail.com commented:
All I can say is that the effort would be non trivial. However, we should definitely keep an eye (or two) on this.
In all cases, we need to increase the team size first before embracing this adventure. Even if the backend is finally there, there is a significant cost with respect to the effort to maintain it.
Metadata Updates
ariya.hi...@gmail.com commented:
There are projects like Chromium Embedded, Berkelium, and Awesomium (closed sourced) which can be the another source of ideas and inspirations.
a...@mrgray.com commented:
My 2 cents: I think a good plan of attack would be using the Webkit2PNG foundations, etc.. and using the native cocoa / similar API's to get the same accesss to the the DOM as is being achieved through QT4. a lot of people don't have QT4, and don't particularly want it, lol.. and webkit has SO MANY ative API's to get at the DOM... ala shouldChangeSelectedDOMRange etc...
don't spend too much time on this board... so this might be childs play.. but the attached script does quite a bit on top of webkit2png, also.. it saves the thumbnails, blah blah blah, but it also creates a nice little xml page, and form map, like ...
Small dependencies... try: import Foundation import WebKit import AppKit import objc import urllib
posting it here in case it's of use....
ariya.hi...@gmail.com commented:
The Qt problem is a moot point, especially with the static build script (see issue 197 and issue 142) and see also issue 226.
The use of Mac WebKit library via Cocoa does not help much for non Mac users. We certainly don't want to end up implementing 2 versions of the PhantomJS API: one for Mac and one in he rest of the world.
@thomasbachem: We concur, this seems like the way to go! We've already been chatting about it on Twitter, and Ariya just posted a new message to the mailing list.
GitHub's Atom might offer an easier way to harness Chromium: https://github.com/atom/atom-shell
@jokeyrhyme Hardly changes anything. PhantomJS needs something deeper, i.e. the already-available Chromium Content Shell.
@ariya there's the Atom announcement post here: http://blog.atom.io/2014/05/06/atom-is-now-open-source.html
Finally, we're just as excited to be open-sourcing Atom Shell as we are about Atom itself. Over its 2.5 years of development, Atom has been something of a hermit crab, beginning its life in a Cocoa WebView, then migrating to the Chromium Embedded Framework, and finally making its permanent home inside Atom Shell. We experimented briefly with Node-Webkit, but decided instead to hire @zcbenz to build the exact framework we were imagining.
We've taken great care to integrate Chromium and Node in a clean, maintainable way, including sponsoring the addition of multi-context support in Node. We also created brightray and libchromiumcontent, which make it easier to embed Chromium into native applications as a shared library.
Are brightray and/or libchromiumcontent of any use in this endeavour?
@jokeyrhyme Maybe yes, maybe not (too early to say). While I appreciate the information, what is useful for this adventure is not a list of all Chromium-related projects out there. A thorough technical analysis is going to be more valuable.
Is QtWebEngine an option? PhantomJS 2.0 builds on top of Qt 5.3, but I had the impression that Qt 5.4 is the lastest stable release.
Not being so tied to a particular, patched version of Qt(/Webkit) seems like a necessary prerequisite for being able to swap out the rendering engine more easily.
For what I do with PhantomJS, being able to match the rendering engine of a particular release version of Safari or Chrome would be quite valuable, but that's an even bigger can of worms.
Did you decide yet which backend you'll use in the future ? QtWebEngine or Electron or another ?
@hadim not yet. We don't have time on that.
Hey all,
I just investigated Qt WebEngine [QWE] for the PhantomJS use-case a bit. Here are my notes:
a) QWE cannot be statically build, esp. considering that it relies on a multi-process architecture it would be hard/impossible to get a single phantomjs
binary that can be deployed to servers. --single-process
mode may help, but this can make things complicated.
b) there is currently no real printing support. It should be simple to get hands on a PDF generated by chromium, and forward that to PhantomJS though. No idea about PNG etc. screenshots though.
c) QWE depends on Qt Quick & QML for the scene graph. This is a big dependency, and drags in OpenGL etc. pp. It could be quite cool to get rid of many PhantomJS parts by just reusing QML (it is a JavaScript runtime after all). But the OpenGL dependency is unfortunate for running PhantomJS on the cloud. Might be remedied via either mesa software rendering or the commercial 2D painter.
I will extend this once I know more.
Cheers
@milianw I tried to investigate this too (and still continuing tho). And I've found that QWE (and Chromium) doesn't support full headless mode, can you confirm that? There is only one way to run it in headless mode by running in with Xvfb.
Thanks.
@Vitallium yep, headless is out of the question as it depends on OpenGL which in turn depends on XCB etc.
I'll have a look at CEF (chromium embedded framework) now. I have the feeling that it's a better choice for the future of PhantomJS. It's just a minimal wrapper (or so I hope) around Chromium. We don't need most of Qt for PhantomJS, just Chromium should be enough.
Modules like the web browser or file system in PhantomJS can/should probably be replaced/removed. Node.js and others already fill that hole well enough, imo.
The ideal outcome, imo, would be a minimal remote-controlled browser that integrates well with node.js. Do you agree?
@milianw yes, I do. I think in the same way actually. We don't need Qt at all. Chromium's code base should be enough to handle all our requirements and needs. Ideally, I want to make it like Electron, but with our API and other stuff.
Lemme note down some things about what I use PJS for, what's hard now and what needs to not break. For reference, this is my controller script: https://github.com/zackw/tbbscraper/blob/master/collector/scripts/pj-trace-redir.js (The name is no longer meaningful.)
Things that need to not break:
--load-images=no
absolutely must keep working; I'm poking a lot of very sketchy websites and do not want my database seized for copyright violation or whatever.file:
URLs, that also needs to keep working.Things that are hard or impossible now:
getaddrinfo()
operation, or even better, detailed DNS packet decodes. This currently isn't possible; I spent two days once and couldn't even find where Qt calls getaddrinfo.page.onLongRunningJavascript
- gosh, it would be nice if that worked.onload
time for the page, but at a point when JavaScript is "done executing" in the page (for some value of "done executing" that isn't the Halting Problem in disguise). Right now I have a hardwired timeout.I have a use case similar to @zackw ; besides the DNS queries and load-images (I want to fetch/check those too), I need deep access to the networking system and the all the other features, that I kind of implemented in a long javascript scraper
@Vitallium As soon as you have something that I can help with, please do let me know.
@zackw well, at this moment I'm just playing with Chromium and WebKit. Each engine has its pros and cons. With WebKit we can guarantee heedlessness for users, but it doesn't have OS specific features like file system or image handling. With Chromium we have everything that we need but Chromium is insanely huge and complex. Sometimes I have no idea what I'm doing. And Chromium doesn't support headless mode. This is very important thing.
Is it necessary to support environments that do not have Xvfb? I understand having fewer dependencies is preferred, but how is requiring Xvfb for Chromium better / worse than the complexity of batteries-not-included WebKit?
If we can identity an important use case where Xvfb is infeasible, then that seems like a way to exit early from the Chromium approach.
FYI, I'm pushing my WIP to https://github.com/KDAB/phantomjs-cef - of course it is currently not functional at all. But from what I've seen so far, it looks good. There is some sort of offscreen rendering, which I haven't implemented yet. The settings are very extensive, and disabling SSL error checks, image downloading, web security, etc. pp. should work just fine.
Tomorrow, I'll try to get PDF printing done, which I haven't figured out yet. Then, I'll tackle the JavaScript bindings to get the good old PhantomJS behavior up and running again. @Vitallium, or anyone else: if you want to chime in, i.e. add patches - you are more than welcome!
I think getting a first proof of concept done in a scratch repo would be a good idea, then we can think about how to integrate it with the upstream repo.
Some issues I've had so far:
@milianw You rock! :+1:
I think, since we going to use Chromium, we don't need to focus on static builds. Let's try with shared first. I'll start playing with CEF from now.
PS: After a few tweaks we can run it on Windows :-)
Hell yeah, gentlemen! :clap:
Long live, Phantomium! :ghost: :crown:
@milianw FYI: I'm working on Windows branch here: https://github.com/Vitallium/phantomjs-cef/tree/windows
@Vitallium: I'm playing around with the JavaScript bindings now, i.e. bootstrap.js and require() etc. pp. I'm realizing that I'd really like to have a cross platform resource system. The cefclient example does something, but the resources are only embedded on windows, but not on Linux.
That, and the cefclient example using GTK for its OSR rendering makes me think if we should integrate QtBase with CEF. We all have experience with Qt, and using an unpatched Qt 5 base as an additional dependency to CEF for cross platform resource systems and painting sounds like a good idea to me. I've found https://github.com/joinAero/qtcefclient which is outdated and apparently windows only, but it shows that CEF + Qt is possible.
The advantage over Qt WebEngine is that this does not include Qt Declarative (i.e. Qt Quick + QML) as a dependency.
What do you guys say?
@milianw I'm playing with it too and I came to the same thing. We really need a cross platform resource system.
I think about the same system as Node.js has. Generate all headers with all included modules (bootstrap, require, etc.).
About the QtBase. If I understand you correctly, you want to add dependency to QtBase to achieve following goals:
Is this correct? If so, I don't mind. But that means we have to integrate and handle an additional message loop that comes with QApplication. We can use project qtcefclient
as a start point to implement it.
Yes, that is exactly what I have in mind. I'll start playing with QtBase + CEF now, and see how I can integrate the message loops.
Great. Then I'll start with... Err... I'll find something!
Using RCC is simple, and it should be similarly trivial to use QPainter or Qt OpenGL abstractions where needed. What we don't get though is a nice integration between the eventloops. For my use case, that isn't really required yet so I simply don't run Qt's eventloop for now.
The big next task will be to get the bootstrap.js and webpage.js to work...
Is there no equivalent means of achieving the same as what QtBase provides in Node.js? I'd really love to see Phantomium be closer to pure Node.js + CEF with a merged V8 event loop so that we can enable all consumers to use standard Node.js modules and paradigms rather than having to learn the quirks of a Qt environment... but, admittedly, I don't fully understand what QtBase is providing us.
Potentially, one could investigate how to wrap CEF in a node module. Instantiating CEF and spawning its subprocess es from a thread may work.
But right now, I just want to get something done, and as quickly as possible. Having to learn node.js internals would hold me up more. I use QtBase currently for:
All of the above may, or may not, work with node. Esp. if it's using the STL (which it hopefully does!), then it may lead to the odd crash I note above...
In the future, I will also use Qt to implement the offscreen rendering, and I doubt node has anything to offer in that regard.
Now, having answered the above, here a quick status update:
I finally got webpage.open, close, evaluate implemented! A big caveat is that the synchronous API to evaluate JavaScript in a webpage is not supported anymore. Chromium, like WebKit2, and thus also CEF, is using a multiprocess architecture for stability and performance reasons. IPC is inherently asynchronous, and I'm reluctant to add blocking API like page.evaluate.
Instead, I opted for a completely async approach, i.e. stuff like
page.evaluate(
function() { return window.location.domain; },
function(ret) { console.log(ret); },
function(errorCode, errorMessage) { console.log(errorCode, errorMessage); }
);
You guys are all better JS developers than me. So: What's the current best practice to design async API in JavaScript? Is the above good enough? Or should one rather apply some continuation pattern with .then()? Should it be two callbacks for success/error, or one to handle both?
By slightly adapting the followers.js example, I could already run it with the cef-phantomjs, which is pretty neat I think.
Promises are in ECMAScript and Node.js now, and many upcoming improvements to W3C Web Platform APIs like fetch()
and getUserMedia()
are Promise-based.
That said, the CommonJS pattern where a single callback is passed, with an Error
object as the first argument in case of error, is a pretty expected pattern within the Node.js community. E.g.
page.evaluate(
function() { return window.location.domain; },
function(err, ret) {
if (err) { /* TODO: handle error */ return; }
console.log(ret);
}
);
If possible, it'd be terrific to support both. I personally prefer Promises, but as the intended use of this is within the Node.js community, I think the error-first callback is probably the mandatory pattern here. It is possible to author functions in a way that both return a Promise
and accept a callback. And I think there are even utility libraries that facilitate this style.
Thanks for the hint, @jokeyrhyme! Promises work a treat:
https://github.com/KDAB/phantomjs-cef/blob/master/examples/load_promise.js
Just a heads up: The last week was pretty productive in phantomjs-cef land and most important features have landed: render, renderBase64, sendEvent, evaluate, injectJs, ...
I especially like how well PhantomJS(-CEF) works with a Promise driven API:
https://github.com/KDAB/phantomjs-cef/blob/master/examples/tui.js
Note how there are no explicit timeouts, rather DOM polling is wrapped in a Promise via https://github.com/KDAB/phantomjs-cef/blob/master/examples/libs/waitForDomElement.js and the new page.waitForLoaded()
also uses a promise to wait until a page has finished loading after submitting the form.
I'll probably spent a bit of time on Windows support the next days. In general, I think this is a very promising result already, and I invite more people to join the effort.
If anyone wants to test PhantomJS-CEF on Windows, I just pushed a first build: https://github.com/KDAB/phantomjs-cef/releases/tag/v0.1.0-alpha
use at your own risk of course ;-) But I'd appreciate any feedback.
Hey! Good job! I have Windows build too. But, hell, that was a really busy week for me. But now I'll help as much as I can. Cheers!
But one question: what about OS X build? I'm not an expert in it.
@Vitallium you have a lot of Windows experience, right? Could you have a look at the debug build of PhantomJS-CEF on Windows? See: http://www.magpcss.org/ceforum/viewtopic.php?f=6&t=13578&p=28331#p28331
I build it against the FOSS Qt 5.5.0 release (msvc2013_64) using
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Debug -GNinja ..
ninja
phantomjs-cef\phantomjs ..\examples\load_promise.js
and it will assert (see forum message). Can you reproduce that issue? Do you know what's going on there? Maybe we'll need to build CEF from sources or something?
@milianw so far I use a debug version, and I don't see any assertions, except the one on the exit. But let me try a fresh copy of your repository.
Build was updated to use a static Qt and MSVC runtime.
Given the limited resources, looks like we're stuck with QtWebKit for the foreseeable future.
nikolay....@gmail.com commented:
Disclaimer: This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #209. :star2: 7 people had starred this issue at the time of migration.