alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.16k stars 648 forks source link

scraper.build returns a blank list #64

Closed p595285902 closed 2 years ago

p595285902 commented 3 years ago

Here is the code to reproduce:

    from autoscraper import AutoScraper

    class Scraper():

        wanted_list = ["0.79"]
        origUrl = 'https://www.sec.gov/Archives/edgar/data/0001744489/000174448921000105/fy2021_q2xprxex991.htm'
        newUrl = 'https://www.sec.gov/Archives/edgar/data/0001744489/000174448921000179/fy2021_q3xprxex991.htm'
        path="Alpaca/Scraper/sec/file.txt"

        def scrape(self):
            scraper = AutoScraper()
            result = scraper.build(self.origUrl, self.wanted_list)
            print(result)
            result = scraper.get_result_exact(self.newUrl)
            print(result)

    if __name__ == '__main__':
        scraper = Scraper()
        scraper.scrape()

Here is the log:

    []
    []

Expected to be:

    [0.79]
    [0.80]
alirezamika commented 2 years ago

This website is using bot detection methods, maybe u can load it with selenium and pass the html to the scraper.