Ceramic request rate limiting failover doesn't work

What happened?

The Ceramic caching in Hasura uses Bottleneck to limit the rate requests are made to the daemon.

There is a memory leak in the JavaScript IPFS daemon that causes it to fail under a sufficient load (#998). Kubernetes will automatically restart the pod, but it causes the Ceramic daemon to go offline for a couple minutes as it restarts.

Bottleneck includes a limiter.on('failure', () => {}) handler that allows retrying jobs with a delay. This would allow graceful management of an outage.

I have registered a failure handler, but it isn't getting called when a job fails.

What did you expect to happen?

When a job fails, the failure handler is called.

How can we reproduce the problem (as minimally as possible)?

Setting maxConcurrent to null & minTime to 0 will cause all jobs to be executed simultaneously & will take down the Ceramic daemon when running yarn hasura:seed-local-db.

There are logging statements in the failure handler and they're not triggered.

Is there anything else we need to know?

This isn't a serious issue since running 30 concurrent jobs at ten per second is relatively quick to process the queue and hasn't been causing an outage, but it would be nice to have this just in case.

MetaFam / TheGame