Iceloof / GoogleNews

Script for GoogleNews
https://pypi.org/project/GoogleNews/
MIT License
314 stars 88 forks source link

added headless chrome for javascript render #87

Closed jacobhtye closed 2 years ago

jacobhtye commented 2 years ago

Pull request to fix issue "Any success with fetching images ! https://github.com/Iceloof/GoogleNews/issues/33" Modified code to use selenium inorder to get the rendered page after javascript. Also allows scraping of pages that use React and other frameworks

HurinHu commented 2 years ago

The selenium is highly dependent on chrome driver, if it can't find chrome driver on the default path it won't work, if added chromedriver as optional parameter, it will affect current users if they upgrade it.

jacobhtye commented 2 years ago

So this is why there is also the driver-manager library. This means that users don't need to have the driver on their path it add it as a parameter. The library looks for it and a if it can't find it it installs it and keeps it up to date

jacobhtye commented 2 years ago

https://github.com/SergeyPirogov/webdriver_manager

HurinHu commented 2 years ago

No, it seems not working, I have try to run unit test and it can not find chrome binary.

jacobhtye commented 2 years ago

Ok I can check that out. How are you running the unit tests? I saw that the Travis CO you have attached ran them and they passed and they passed on my computer. I can see why it wouldn't be working for you

HurinHu commented 2 years ago

python3 -m unittest discover 'test' 'test*.py' I tested it under Ubuntu without chrome installed, I also try to import chromedriver-binary, it seems also not working well.