element-hq / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://element-hq.github.io/dendrite/
GNU Affero General Public License v3.0
41 stars 7 forks source link

Sanity-check where we panic/recover #899

Closed matrixbot closed 3 weeks ago

matrixbot commented 3 weeks ago

This issue was originally created by @kegsay at https://github.com/matrix-org/dendrite/issues/899.

Dendrite should only panic as a result of:

Dendrite should not recover panics unless we know it won't break the component afterwards.

Dendrite shouldn't panic in other scenarios because a given component does not know if it is running in monolith/polylith mode. If it panics in monolith mode it takes out the entire server. If it panics in polylith mode then it just takes out the component which hopefully restarts and continues correctly.

matrixbot commented 3 weeks ago

This comment was originally posted by @kegsay at https://github.com/matrix-org/dendrite/issues/899#issuecomment-597763353.

If consumers encounter a corrupt event for a room, mark the room as ‘bad’ e.g in map[string]bool and then continue on, with appropriate shouty logs. When it encounters more events for said room, log and drop.

If consumers can more gracefully fail (e.g more resilient) then they should do whatever is going to preserve the functioning of the component. For example, if the public rooms API cannot add a room, oh well, don't crash the server, just log and drop.

matrixbot commented 3 weeks ago

This comment was originally posted by @kegsay at https://github.com/matrix-org/dendrite/issues/899#issuecomment-680924879.

I don't feel there's anything actionable here anymore - agreed @neilalexander ?

matrixbot commented 3 weeks ago

This comment was originally posted by @neilalexander at https://github.com/matrix-org/dendrite/issues/899#issuecomment-680939471.

Yep, agreed.