DistributedProofreaders / dproofreaders

Distributed Proofreaders is a web application intended to ease the process of converting public domain books into e-texts.
https://www.pgdp.net
GNU General Public License v2.0
46 stars 28 forks source link

Convert SR notifications & PG catalog import to BackgroundJobs #1243

Closed cpeel closed 4 days ago

cpeel commented 1 week ago

This converts the SR notification and PG catalog import cronjobs to BackgroundJobs. The SR change was a very straight lift. The PG catalog import now extracts to a temporary location and cleans up after itself rather than writing to /data/htdocs/d/pg/catalog. The files are never used or accessed after the database import so there's no reason for it to run in the web context.

Examples of both jobs with the verbose output:

cpeel@ip-172-31-12-95:~/u-cpeel-dp/crontab (sr-and-pg-import)$ php run_background_job.php SendSmoothreadingNotifications true
Background job: SendSmoothreadingNotifications
Status: No notifications sent
cpeel@ip-172-31-12-95:~/u-cpeel-dp/crontab (sr-and-pg-import)$ php run_background_job.php ImportPGCatalog true
Background job: ImportPGCatalog
Status: Processed 73799 etexts
Output:
Downloading https://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.bz2 to /tmp/pg-catalog-files/rdf-files.tar.bz2...
Extracting files from /tmp/pg-catalog-files/rdf-files.tar.bz2 to /tmp/pg-catalog-files...
Scanning files in /tmp/pg-catalog-files...
Finished processing 73799 RDF files.
Warning: The following MIME types do not have entries in $display_mapping:
     11 application/x-musescore
Putting the data into the table...