BryanWilhite / jupyter-central

an attempt to centralize my little collection 📚 of jupyter notebooks in one place 🚀 🌚 (which might not be a great idea)
4 stars 0 forks source link

get started with `beautifulsoup4` and `selenium` #28

Open BryanWilhite opened 4 years ago

BryanWilhite commented 4 years ago

i assume that sites from Bloomberg and Medium are too cool for beautiful soup, requiring escalation to Selenium

https://pypi.org/project/beautifulsoup4/

https://pypi.org/project/selenium/

BryanWilhite commented 3 years ago

Selenium or BeatifulSoup?

Before answering your question directly, it's worth saying as a starting point: if all you need to do is pull content from static HTML pages, you should probably use a HTTP library (like Requests or the built-in urllib.request) with lxml or BeautifulSoup, not Selenium (although Selenium will probably be adequate too).

https://stackoverflow.com/a/17436663/22944

http://lxml.de/ http://www.crummy.com/software/BeautifulSoup/