DistributedProofreaders / dproofreaders

Distributed Proofreaders is a web application intended to ease the process of converting public domain books into e-texts.
https://www.pgdp.net
GNU General Public License v2.0
46 stars 28 forks source link

New background job infrastructure #1241

Closed cpeel closed 1 week ago

cpeel commented 1 week ago

This is groundwork to address https://github.com/DistributedProofreaders/dproofreaders/issues/1233. I have a branch that builds on this one and converts all the crontab/* files to confirm they will all work in this framework, but am presenting just two converted scripts here as part of the infra review.

This PR introduces a new BackgroundJob abstract class that background jobs should inherit from. It handles capturing runtime, writing entries into job_lobs, capturing output, and handling errors. Jobs will be run from the new crontab/run_background_job.php script that is expected to be run from the CLI. If a job requires a web context (due to file permissions, etc), it sets a variable and the CLI script will proxy the request to the web server and run the job from there. I plan to eventually convert automodify.php to this infra and it will remove our dependency on URL_DUMP_PROGRAM entirely.

I structured the background jobs as classes with filenames that match the class. This allows jobs to be defined and loaded by the autoloader and does not require run_background_job.php to know about it ahead of time. The script does strict validation about who can run it and that the requested job is a valid BackgroundJob.

By default BackgroundJob will only write to stdout if there is an error. This will make cron only send emails on failures. Scripts are then welcome to output data to their hearts content and it will only be seen if there is an error. This can be changed with an optional second parameter. When we first roll this out to PROD we'll be setting the second param to true to confirm things look good and then going to silent-on-success mode.

The two converted scripts in this PR are:

Examples of the first script. No error so there is no output:

$ php run_background_job.php ToggleSpecialDayQueues
$

Shows all output:

$ php run_background_job.php ToggleSpecialDayQueues true
Background job: ToggleSpecialDayQueues
Status: 0 queues open'd, 0 queues close'd
Output:
Looking for special events to open...
                SELECT spec_code
                FROM special_days
                WHERE open_month = 6 AND open_day = 17
                    AND enable = 1
            0 queues open'd.
Looking for special events to close...
                SELECT spec_code
                FROM special_days
                WHERE close_month = 6 AND close_day = 16
                    AND enable = 1
            0 queues close'd.
$

You can see the script runs in the Job Log at https://www.pgdp.org/~cpeel/c.branch/new-background-job-infra/tools/site_admin/show_job_log.php?days_ago=1&filename=&event=END

Between the file renames and the changes to put it in a class these are a PITA to review 😕

cpeel commented 1 week ago

I've added one more commit that updates the job_logs table to explicitly capture the job status, and BackgroundJob to populate that status. This will help us more easily locate failed jobs in the UI.

I've updated an existing upgrade script that has not yet been run on PROD so it should be safe to change.