aisingapore / TagUI

Free RPA tool by AI Singapore
Apache License 2.0
5.65k stars 585 forks source link

Why PhantomJS/CasperJS is used? #16

Closed LeMoussel closed 7 years ago

LeMoussel commented 7 years ago

Why PhantomJS & CasperJS is used?

There is nightmare, a high-level browser automation library. Under the covers it uses Electron, which is similar to PhantomJS but roughly 2 times faster and more modern. The is also a Chrome extension Daydream , to record and generates Nightmare scripts for you while you browse.

kensoh commented 7 years ago

Hi @LeMoussel thank you for raising this question. Yep I'm aware of Nightmare (driving Electron underneath) and Daydream Chrome extension. I also know a financial startup from Budapest using Nightmare as the data-scraping solution.

The reason for using CasperJS/PhantomJS is partly personal and partly technical (but please correct me if my understanding is missing something). I got started with web automation using the CasperJS solution 2 years ago and was writing scripts manually for a long time, before I thought maybe I can write something to semi-automate the process of coding scripts. That's why I started on TA.Gui. So I'm familiar with CasperJS enough to code a wrapper converting 'natural language' syntax to CasperJS JavaScript syntax.

The technical reason is both CasperJS, PhantomJS and PHP (bulk of code of TA.Gui for parsing and API services) are all very old technologies. The reason for choosing very old and along with that, very mature technologies, is that they are relatively more stable than newer ones. While newer technologies have code changes/commits often for improvements, more mature ones don't iterate as often.

That is a big bonus, because if I'm making something that wraps around something that wraps around something, they better be as rock solid and non-changing as possible. Otherwise I will have to keep chasing after their new releases or risk the outermost layer of the wrapper breaking.

As for difference in performance, from my experience, the bottleneck during automation execution is the network latency (sending and receiving requests/data from web browser to the web server), so I didn't think optimizing differences in that area can bring significant benefits. I believe only a small % of the total execution time is overhead of the tools. The overwhelmingly large % of execution time is due to network latency in waiting for target web-apps to respond.

I'll add feedback label to this issue and close it, but do continue sharing your thoughts!