elliotgao2 / gain

Web crawling framework based on asyncio.
GNU General Public License v3.0
2.04k stars 207 forks source link

Custom header. #5

Closed elliotgao2 closed 7 years ago

elliotgao2 commented 7 years ago

def generate_header():
    header = {'User-agent': 'Google spider'}
    return header

class MySpider(Spider):
    start_url = 'https://blog.scrapinghub.com/'
    header =  generate_header
    concurrency = 5
    parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'),
               Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)]

and


class MySpider(Spider):
    start_url = 'https://blog.scrapinghub.com/'
    header =   {'User-agent': 'Google spider'}
    concurrency = 5
    parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'),
               Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)]

Both should be supported.