TromsFylkestrafikk / ragnarok

Harvest public transport data for statistics usage
0 stars 0 forks source link

Add 'pending' as state on models and don't allow operations on them #50

Closed tfk-kaare closed 7 months ago

tfk-kaare commented 8 months ago

Problem / Motivation

Batch operations over several chunks are time consuming and maintainers would easily be tempted to add new batch operations on chunks already in progress.

Suggested resolution

Add a 'pending' state on chunk models when they are queued in a batch job. How to materialize this is yet undetermined, but a few suggestions are:

  1. Add a new chunk column 'pending', thus disallowing all operations when this is set. The value of this column should be the batch ID it belongs to.
  2. Cache all pending chunks and map them to batch ID. Effectively a lock per chunk model dis-allowing further operations if it's locked.
  3. Add a pivot table mapping batches to chunks and visa versa. This creates a many-to-many relationship, which isn't what we really want. Two batch operations on the same chunk is what we want to avoid.
tfk-kaare commented 8 months ago

It must be possible to revert this state, which means an additional column or cache is required to keep track of original state before 'pending'

tfk-kaare commented 8 months ago

This means, the pending state should not be an option on the import_status or fetch_status column. This smells like a cache problem space

tfk-kaare commented 8 months ago

The immediate problem I see, independent of path forward, is that some batches have several jobs (fetch + import) per chunk. Finishing one job does not unlock it as there is one more pending.

Sooo, one lock for each of 'fetch' and 'import' and both need to be unlocked for other to acquire them. Maybe a semaphore-ish lock where you acquire with N elements and all have to be released before next in line.

tfk-kaare commented 7 months ago

The solution in my head is to continue to increase the number of columns in ragnarok_chunks. Two new columns: fetch_batch and import_batch. Using semaphore/counters doesn't keep track of what batch it belongs to.

This will require additional accessors and query scopes.