Open cgueret opened 8 years ago
The simplest fix is probably to do three things:
modified
column of the state
table.spindle_mqmessage_reject_()
to update the modified
column.libmq
implementation (i.e., in spindle_mq_next_()
, prior to attempting the current one) which does something like SELECT "id" FROM "state" WHERE "status" = 'REJECTED' AND "tinyhash" % nodecount = nodeid AND "modified" <= cutoff
, where cutoff
is a timestamp 24 hours in the past (or better, a configurable value, which would mean you could ask Twine to re-process all rejected items by specifying the cutoff as 0 on the command-line).
If the connection to S3 is lost during the processing of the queue the URIs are marked as REJECTED and not re-visited. The ingest of the data set is then never "complete":
In the DB, the REJECTED URIs looks like: