fmira21 / ton-node-docker

The Open Network full node in Docker
149 stars 12 forks source link

TonlibWorker #000 is dead!!! Exit code: 12 #3

Open dwjorgeb opened 6 months ago

dwjorgeb commented 6 months ago

Hello

I'm getting these errors on the API, and I can't seem to be able to make any requests to the liteserver

2024-05-03 00:08:05.991 | ERROR    | pyTON.manager:check_children_alive:232 - TonlibWorker #000 is dead!!! Exit code: 12
2024-05-03 00:08:06.000 | INFO    | pyTON.manager:read_results:184 - Task read_results from TonlibWorker #000 was cancelled
2024-05-03 00:08:08.002 | ERROR    | pyTON.manager:check_children_alive:232 - TonlibWorker #000 is dead!!! Exit code: 12
2024-05-03 00:08:08.011 | INFO     | pyTON.manager:read_results:184 - Task read_results from TonlibWorker #000 was cancelled

2024-05-03 00:18:11.562 | ERROR    | pyTON.manager:check_children_alive:232 - TonlibWorker #000 is dead!!! Exit code: 12
2024-05-03 00:18:11.572 | INFO     | pyTON.manager:read_results:184 - Task read_results from TonlibWorker #000 was cancelled
2024-05-03 00:18:13.573 | ERROR    | pyTON.manager:check_children_alive:232 - TonlibWorker #000 is dead!!! Exit code: 12
2024-05-03 00:18:13.582 | INFO     | pyTON.manager:read_results:184 - Task read_results from TonlibWorker #000 was cancelled

All requests end up with a timeout

{"ok":false,"error":"Liteserver timeout","code":504}

if I do /getWorkerState I get this result:

{'code': None,
 'error': None,
 'ok': True,
 'result': {'0': {'id': {'@type': 'pub.ed25519',
                         'key': 'ZZZ'},
                  'ip': 2902065154,
                  'is_archival': False,
                  'is_enabled': False,
                  'is_working': True,
                  'last_block': 37663839,
                  'ls_index': 0,
                  'port': 43679,
                  'restart_count': 0,
                  'tasks_count': 0}}}

This block seems to be a few minutes out of date, but otherwise looks alright.

But if I do any other request, it times out. Any help debugging this?

fmira21 commented 6 months ago

Hi @dwjorgeb Thanks for heads up and sorry for silence. Finally, I managed to reproduce this behaviour.

When your node is out of sync, the API service marks it as is_enabled: false since it by design works with multiple nodes. This issue may be resolved by itself when your node gets back to the tip. Still, to prevent this situation from happening, I would recommend to stick to official node system requirements to minimise gaps.

fmira21 commented 6 months ago

I won't close this issue to collect feedback in case there are similar problems.

phonglnDEV commented 3 months ago

Hi, @fmira21 I'm facing this problem Do you think it's the same problem as @dwjorgeb's? image