MetaFam / TheGame

The platform that MetaGame will be played on aka MetaOS - an open source framework for running decentralized societies. Currently featuring MyMeta Profiles, Dashboard & Quests
https://metagame.wtf
Other
131 stars 78 forks source link

Ceramic request rate limiting failover doesn't work #1061

Closed dysbulic closed 10 months ago

dysbulic commented 2 years ago

What happened?

The Ceramic caching in Hasura uses Bottleneck to limit the rate requests are made to the daemon.

There is a memory leak in the JavaScript IPFS daemon that causes it to fail under a sufficient load (#998). Kubernetes will automatically restart the pod, but it causes the Ceramic daemon to go offline for a couple minutes as it restarts.

Bottleneck includes a limiter.on('failure', () => {}) handler that allows retrying jobs with a delay. This would allow graceful management of an outage.

I have registered a failure handler, but it isn't getting called when a job fails.

What did you expect to happen?

When a job fails, the failure handler is called.

How can we reproduce the problem (as minimally as possible)?

Setting maxConcurrent to null & minTime to 0 will cause all jobs to be executed simultaneously & will take down the Ceramic daemon when running yarn hasura:seed-local-db.

There are logging statements in the failure handler and they're not triggered.

Is there anything else we need to know?

This isn't a serious issue since running 30 concurrent jobs at ten per second is relatively quick to process the queue and hasn't been causing an outage, but it would be nice to have this just in case.

Seroxdesign commented 10 months ago

Outdated, also Ceramic has been updated.