filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.84k stars 1.27k forks source link

Check max_map-count on start #9996

Open Stebalien opened 1 year ago

Stebalien commented 1 year ago

Checklist

Lotus component

Improvement Suggestion

We should read /proc/sys/vm/max_map_count on start and print a warning if it's less than 512<<10 (8x the default) according to @ribasushi.

Stebalien commented 1 year ago

Motivation: We're running out of maps due to badger and the FVM.

ribasushi commented 1 year ago

Prior-art UX/DX from an unrelated product: https://stackoverflow.com/q/42889241

jennijuju commented 1 year ago

this is needed for v1.20.0 correct?

magik6k commented 1 year ago

Could we manage this like we do FD limit today? (https://github.com/filecoin-project/lotus/blob/b706efc33bd392d5be4f0be6679f8c5b23d98e26/lib/ulimit/ulimit.go). Probably not given that it's system wide?

For a VERY visible warning we should use the alert system - e.g. https://github.com/filecoin-project/lotus/blob/b706efc33bd392d5be4f0be6679f8c5b23d98e26/node/modules/alerts.go#L8. Alerts:

ribasushi commented 1 year ago

@magik6k yeah, it's not a ulimit type of settings. The value is still per-process, but it is set globally in sysctl.

You could be smarter and combine the alert with the "count badger datafiles" feature from another ticket. Then you can derive an intelligent ballpark based on:

Then:

mur-me commented 1 year ago

I want to put my two cents in: it will be better to stop lotus with non-zero exit code on the start and put message in the logs, operator will read the reason and fix it at the begging.

It will be better UX for users - failfast instead of ignoring logs message and crash afterwards.

ribasushi commented 1 year ago

@TippyFlitsUK I won't have capacity to work on this: assigning it to me ensures it will never get done :)