EFForg / phantom-of-the-capitol

GNU Affero General Public License v3.0
181 stars 63 forks source link

phantomjs alternative #124

Open wioux opened 7 years ago

wioux commented 7 years ago

Should we find an alternative to phantomjs? The maintainer has stepped down.

mfb commented 7 years ago

There is now firefox headless https://mykzilla.org/2017/08/30/headless-firefox-in-node-js-with-selenium-webdriver/ or I guess more popularly, chrome headless.

j-ro commented 7 years ago

We'd support a move. I heard chrome headless is much faster. That said, we have no development time to devote to this at the moment unfortunately. And I don't feel a ton of urgency about it, we don't even keep up with phantom updates as it is.

On Sep 21, 2017, at 6:52 PM, mark burdett notifications@github.com wrote:

There is now firefox headless https://mykzilla.org/2017/08/30/headless-firefox-in-node-js-with-selenium-webdriver/ or I guess more popularly, chrome headless.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

wioux commented 7 years ago

Do we still need to support webkit/waitir? REQUIRES_WAITIR is empty and all the bioguide ids from REQUIRES_WEBKIT are house members so we can clear that out, but I'm not sure what the need for the alternative drivers was originally and whether it might come up again. We could really simplify parts of the app if we removed support for those drivers.

j-ro commented 7 years ago

I think that's probably fine over here, yeah...

ghost commented 7 years ago

https://github.com/GoogleChrome/puppeteer

j-ro commented 6 years ago

has anyone started work on this?

wioux commented 6 years ago

Not yet @j-ro.

j-ro commented 6 years ago

Thanks @wioux, us either, though it's starting to become more important for us. I'll let you know if it lands on my roadmap. Can you do the same, so we don't duplicate work?

wioux commented 6 years ago

Definitely, I'll let you know.

j-ro commented 6 years ago

We're actually doing a bit of initial investigation work on this today, maybe tomorrow too. We'll let you know how it works. There may be just a drop-in replacement that works with capybera, if so, will be fairly easy....

j-ro commented 6 years ago

Update here -- we have chromedriver running, but it's probably not quite ready for prime time. It works, but seeing some hard to debug timeout errors, and it's missing some features like blacklists. We're going to run it as an optional switch for certain yamls since it helps in some cases, but we're not going to entirely switch. If there's large appetitive for the code we can put together a PR, but it's very much a WIP.

k-stewart commented 6 years ago

Hey @j-ro, this is becoming more important for us. Have you found a solution you like?

j-ro commented 6 years ago

No, we're still with phantom. Chromedriver works but not as consistently, and it doesn't have many hooks and options to debug and tune. We haven't looked at it since January, maybe that's changed, but we're not planning a switch.

k-stewart commented 6 years ago

Ok, thanks for the insight. I'll see if anything's changed since then.

j-ro commented 6 years ago

Worth a shot -- it didn't really take us very long at all to drop in Chromedriver -- the hard part was getting it to work reliably.

ghost commented 5 years ago

I'll chime in with my experience as I have worked with puppeteer, and phantomjs, and various selenium webdriver implementations like chromedriver and geckodriver. Puppeteer provides a high level API that is quite easy to work with for basic scraping. They publish extensive documentation as well. If needing to get something done quick, I think this is a strong contender. It is a JavaScript only API as far as I know. Selenium webdriver implementations give you more flexibility with the browser you run the automation in but require more programming and configuration to get working. The API is also implemented in different programming languages. Firefox's headless documentation also recommends using selenium webdriver for testing automation.

ghost commented 5 years ago

Just discovered @k-stewart 's work in #141 as well.

wioux commented 5 years ago

Hi @efx. Our contact-congress work has moved over to EFForg/congress_forms_api to fix this and other issues. Sorry we didn't properly archive this repo -- I'm going to do that now.

ghost commented 5 years ago

Thanks @wioux. I had found this repository from EFF's homepage, so we should probably update those link(s) as well.

danielmroberts commented 1 year ago

Hi @efx. Our contact-congress work has moved over to EFForg/congress_forms_api to fix this and other issues. Sorry we didn't properly archive this repo -- I'm going to do that now.

This repo is still not archived. We were about to roll out a system we have been working on for a while based on phantom of the capitol before noticing your comment :(