Closed carver closed 3 months ago
The logs are accumulating indefinitely. They went back a month, and full of ~60 error logs a second like this:
[2024-08-26T21:48:43Z ERROR glados_audit::state] Error getting random state root. err=Failed to acquire connection from pool
I just deleted the logs to free up space, and make deployments work again. Not sure if this instance was ever working, it's my first time interacting with it.
The trin logs are also accumulating indefinitely, with this once every 30 seconds:
{"log":"2024-08-26T21:54:25.513513Z INFO trin_state: reports~ data: radius=0.0000% content=0.0/0mb #=0 disk=0.1mb; msgs: offers=0/0, accepts=0/0, validations=0/0\n","stream":"stdout","time":"2024-08-26T21:54:25.513802105Z"}
{"log":"2024-08-26T21:54:25.513522Z INFO trin_state: reports~ utp: (in/out): active=0 (0/0), success=0 (0/0), failed=0 (0/0) failed_connection=0 (0/0), failed_data_tx=0 (0/0), failed_shutdown=0 (0/0)\n","stream":"stdout","time":"2024-08-26T21:54:25.513806243Z"}
{"log":"2024-08-26T21:54:25.514674Z INFO trin_beacon: reports~ data: radius=0.0000% content=0.0/0mb #=0 disk=0.1mb; msgs: offers=0/0, accepts=0/0, validations=0/0\n","stream":"stdout","time":"2024-08-26T21:54:25.514888242Z"}
{"log":"2024-08-26T21:54:25.514693Z INFO trin_beacon: reports~ utp: (in/out): active=0 (0/0), success=0 (0/0), failed=0 (0/0) failed_connection=0 (0/0), failed_data_tx=0 (0/0), failed_shutdown=0 (0/0)\n","stream":"stdout","time":"2024-08-26T21:54:25.514921703Z"}
{"log":"2024-08-26T21:54:25.514736Z WARN portalnet::overlay::service: No nodes in routing table, find nodes query cannot proceed.\n","stream":"stdout","time":"2024-08-26T21:54:25.514927801Z"}
{"log":"2024-08-26T21:54:25.552064Z INFO trin_history: reports~ data: radius=0.0000% content=0.0/0mb #=0 disk=0.1mb; msgs: offers=0/0, accepts=0/0, validations=0/0\n","stream":"stdout","time":"2024-08-26T21:54:25.552324875Z"}
{"log":"2024-08-26T21:54:25.552092Z INFO trin_history: reports~ utp: (in/out): active=0 (0/0), success=0 (0/0), failed=0 (0/0) failed_connection=0 (0/0), failed_data_tx=0 (0/0), failed_shutdown=0 (0/0)\n","stream":"stdout","time":"2024-08-26T21:54:25.55237521Z"}
Hm, my sense of the chatter is that we do not really expect this glados instance to be working, so I'm not going to do any prevention work right now.
Todo:
First look:
So it looks like the docker containers for trin and glados-audit are eating up the majority of the space, especially glados-audit. For comparison, the mainnet instance:
In the mainnet instance, the largest docker container is only ~200MB, compared to 86GB on angelfood.