alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.46k stars 668 forks source link

URL cannot carry parameters #101

Open beichen2023 opened 1 week ago

beichen2023 commented 1 week ago

I have discovered a fatal issue If the URL contains parameters, information cannot be obtained normally Successful code from autoscraper import AutoScraper url = ' https://book.douban.com/ ' wanted_list = ["8.3"] scraper = AutoScraper()

Here we can also pass html content via the html parameter instead of the url (html=html_content)

result = scraper.build(url, wanted_list) print(result) Failed code (?subcat=) from autoscraper import AutoScraper url = ' https://book.douban.com/latest?subcat=%E5%85%A8%E9%83%A8 ' wanted_list = ["67"] scraper = AutoScraper()

Here we can also pass html content via the html parameter instead of the url (html=html_content)

result = scraper.build(url, wanted_list) print(result)

alirezamika commented 1 week ago

As i checked manually there's no 67 in the second url page. i checked with other wanted elements and there was no problem.