Checkpointing for large operations

AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing

Other

7 stars 24 forks source link

Checkpointing for large operations #209

Closed charvolant closed 6 years ago

charvolant commented 7 years ago

It would be helpful to have checkpointing for bulk processing and bulk indexing operations, so that if the job falls over halfway through, it can start from a last known good point, rather than from the beginning. That way, if a 4-day job falls over in the last hour, it can be restarted easily.

djtfmartin commented 7 years ago

Ive added this for cassandra 3. Just a single file based checkpoint thing which uses the index token ranges. Works well for processing and indexing. For indexing, theres an additional step that deletes entries from a partially indexing token range before re-starting indexing.

djtfmartin commented 6 years ago

this is done for version 2.x and i dont think we'll do for 1.9.