clearlydefined / crawler

A service that crawls projects and packages for information relevant to ClearlyDefined
MIT License
43 stars 30 forks source link

add support for a secondary queue #571

Open elrayle opened 4 weeks ago

elrayle commented 4 weeks ago

Description

Add support for configuring a secondary queue. When ready to process more requests, the crawler will look for requests in the primary queue. If none found, it will look for requests in the secondary queue.

Tasks:

Configuration

As an example, the kubernetes config in crawler.yml would have the following changes. There are other places requiring updates including other example configs, config defaults in code, and the code processing the configs and queues.

Current Configuration

            - name: CRAWLER_QUEUE_PREFIX
              valueFrom:
                secretKeyRef:
                  name: secrets
                  key: CRAWLER_QUEUE_PREFIX

Proposed Configuration

            - name: CRAWLER_QUEUE_PREFIX
              valueFrom:
                secretKeyRef:
                  name: secrets
                  key: CRAWLER_QUEUE_PREFIX

            - name: CRAWLER_SECONDARY_QUEUE_PREFIX
              valueFrom:
                secretKeyRef:
                  name: secrets
                  key: CRAWLER_SECONDARY_QUEUE_PREFIX
qtomlinson commented 3 weeks ago

Instead of adding a secondary queue (QueueSet internally), an alternative approach to achieve the same goal could be to make the QueueSet configurable. Currently, the QueueSet is hardcoded as prefix-immediate, prefix-normal, prefix-soon, and prefix-later internally, with the prefix being the only configurable part. It is potentially possible to allow the list of queue names to be configurable instead, rather than allowing the prefix to be configurable and constructing the names implicitly for the QueueSet. In the configuration, the existing prefix-immediate, prefix-normal, prefix-soon, and prefix-later can be specified explicitly, and additional queues can be added alongside the existing four queues. The weights for the queues in the QueueSet can already be configured, allowing for prioritization when pulling from specific queues.