357 flexible job pipelines for the datahandler

ra-tolson commented 7 years ago

This fixes #350 and some of #357. Masking isn't supported in this PR as it's too long as it is. Most important changes:

Job gets new fields for parameters, steps and kwargs. Steps is a dict of steps such as 'export' and 'aggregate'. Each value is a dict of arguments for the step in question. Currently supported workflows are:

No steps; job completes after fetch & process.
export.
export, then aggregate.

kwargs is a reserved field for future expansion, to let jobs have keyword arguments that are job-wide instead of specific to a step.

Added meaningful validation for various fields in various tables; it's not comprehensive but it's a start. Note this meant replacing update_status calls.

Added a custom dict literal field type to isolate all the repr and eval calls to a single place, and use consistent validation errors that we can trap for later on.

Configurable spatial chunking for PPJs via the PP_SPATIAL_CHUNK_SIZE setting.

The DH works with sqlite now; the major needed change was a postgres-only distinct() call.

Change PPJ submittal and working function: Remove job PK from PPJ args because it's duplicate information since the job is a foreign key on PPJs. Send PPJ IDs to the worker task via queue.submit; stop sending chunked arguments because the PPJ arguments are getting saved in the database already. Catch exceptions and fail corresponding PPJs.

Refactor job completion check into its own function since the scheduling function for PPJs is too long as it is, and remove 'scheduled' from the check for task failure as it's a race condition with tasks freshly submitted earlier in the same scheduler run.

ra-tolson commented 7 years ago

Oops, forgot to fix submit_request, just a moment.

ra-tolson commented 7 years ago

fixed a bug; afaik this PR is ready for code review.

ghost commented 7 years ago

Can one of the admins verify this patch?

Applied-GeoSolutions / gips

357 flexible job pipelines for the datahandler #374