apify / actor-templates

This project is the :house: home of Apify actor template projects to help users quickly get started.
https://apify.com/
24 stars 14 forks source link

Scrapy Actor: Update ActorDatasetPushPipeline from 1 to 1000 #238

Closed vdusek closed 9 months ago

vdusek commented 9 months ago

While testing the Scrapy wrapper, I encountered a bug where the ActorDatasetPushPipeline with a priority number of 1 is being executed first. This makes all other item pipelines essentially useless. It should ideally be executed as the last one to perform all the necessary cleaning processes before storing the data.

From Scrapy docs (https://docs.scrapy.org/en/latest/topics/item-pipeline.html#activating-an-item-pipeline-component):

The integer values you assign to classes in this setting determine the order in which they run: items go through from lower valued to higher valued classes. It’s customary to define these numbers in the 0-1000 range.