Marmeladenbrot / Crawler

0 stars 2 forks source link

say thanks non-issue #1

Open salesiss opened 6 years ago

salesiss commented 6 years ago

We can't find your email anywhere but wish to thank you a lot for https://github.com/ariya/phantomjs/issues/13581. We were going crazy&nuts with concurrency, timeout and crashes with a large list of url's for phantomJs to process. Your code clarifed that simply processing everything sequentially was a much better alternative than queues or callback hell, also consuming less memory - yeah! - THANKS

page.onLoadFinished = function(status) {
// more here nextPage(); // SIMPLY MUCH NEEDED! -> 'GENIUS!!!' } }{;

Marmeladenbrot commented 6 years ago

I'm not really active at Github, only opening issues/bug reports for tools I use. Also I don't monitor the email account I've used here (if the email is even visible, guess it's not), it's just used for registering for services.

I'm glad I could help you. For the future you may want to migrate from PhantomJS to headless Chrome with "DevTools protocol", enabling a way better resource management and probably a worker pool that uses a single tab per worker which would be the ultimate goal for concurrency+parallelism in terms of large url lists.

Besides that, the development of PhantomJS has ended because of headless Chrome so the migration path is quite clear if you need new features in the future.

The Chrome team is also talking to the Firefox team about supporting the same headless protocol so that you can maybe use the same code for Chrome and Firefox in the future if needed.

Best regards and happy crawling Marmeladenbrot