alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.24k stars 654 forks source link

wanted_list presupposes knowledge of page #11

Closed zbrill closed 4 years ago

zbrill commented 4 years ago

Perhaps we want to let the scraper find the data we know will be in the page, we just don't know the value of it. Using your first example for the Stack Overflow related questions, what if we could instead modify the list as such:

wanted_list = ["Related"]

and this would still return the same output: 'How do I merge two dictionaries in a single expression in Python (taking union of dictionaries)?', 'How to call an external command?',..., 'Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?'

Similarly, for the stock price example we need to know the stock's value ahead of time. What if we want to grab those values using the scraper?

wanted_list = ["Previous Close", "Day's Range"]

This also allows get_result_exact to feed us the same values given another stock's yahoo url.

I'm not familiar enough with BeautifulSoup to know how simple it would be to parse out these elements' values, but it could be worth thinking about.

Cheers!

alirezamika commented 4 years ago

Despite being cool, I'm not sure how useful it would be. Does it worth the performance and accuracy penalty? I mean if you know what you want from that page, you've probably opened it. So why not just copy and paste it in the wanted list?