apify / actor-templates

This project is the :house: home of Apify actor template projects to help users quickly get started.
https://apify.com/
24 stars 14 forks source link

Python Scrapy Actor: Logging from Spiders (and other components) is still problematic #232

Closed vdusek closed 9 months ago

vdusek commented 9 months ago

Description

Scrapy recommends utilizing the Spider.logger property for logging within spiders, defined as follows:

    @property
    def logger(self) -> logging.LoggerAdapter:
        logger = logging.getLogger(self.name)
        return logging.LoggerAdapter(logger, {"spider": self})

This approach aligns the logger's name with the spider's name. However, our custom logging configuration in python-scrapy/src/main.py#L40:L59 interferes with this logging mechanism when the project is executed on Apify.

Example

Consider the following spider:

class BookSpider(Spider):
    name = 'book_spider'
    start_urls = ['http://books.toscrape.com/']

    def parse(self, response: Response) -> Generator[BookItem | Request, None, None]:
        self.logger.info(f'BookSpider is parsing {response}...')
        articles = response.css('article.product_pod')

        for article in articles:
            yield BookItem(
                title=article.css('h3 > a::attr(title)').get().strip(),
                price=article.css('.price_color::text').get().strip(),
                rating=article.css('.star-rating::attr(class)').get().strip(),
                in_stock=article.css('.instock.availability::text').getall()[1].strip(),
            )

        next_page_link = response.css('li.next a::attr(href)').extract_first()
        if next_page_link:
            yield response.follow(next_page_link)

When executed using Scrapy, the log message "BookSpider is parsing..." is successfully logged. However, when run on Apify, the log message is not logged at all.

Possible solution