alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.16k stars 648 forks source link

Won't find special characters #41

Closed BeenHijacked closed 3 years ago

BeenHijacked commented 3 years ago

When trying to find anything that contains a . in it I get no results.

url = 'https://pastebin.com/APSMFRLL'

# We can add one or multiple candidates here.
# You can also put urls here to retrieve urls.
wanted_list = ["."]

scraper = AutoScraper()
result = scraper.build(url, wanted_list)
print(result)

I would've expected to get:

one.two
three.four
five.six
seven.eight

Maybe I'm not doing something correctly perhaps.

alirezamika commented 3 years ago

The scraper looks for exact match, not patterns. Try this:

url = 'https://pastebin.com/APSMFRLL'

wanted_list = ["one.two"]

scraper = AutoScraper()
scraper.build(url, wanted_list)
result = scraper.get_result_similar(url, contain_sibling_leaves=True)
print(result)