golemfactory / dapp-runner

GNU Lesser General Public License v3.0
5 stars 6 forks source link

dapp-runner bounces off on the same faulty provider all the time #121

Closed grisha87 closed 1 year ago

grisha87 commented 1 year ago

When starting the dapp, the dapp-manager/yapapi bounces off the same broken provider multiple times (see attached logs).

For the end user this means that the dapp is not working at all (it has been noticed by our community and reported on Discord).

For us, this generates a very high number of invoices for fractions of GLM which are stuck in a "RECEIVED" (as finally the application does not start and times-out in 5 minutes).

We talked about this with Przemysław Rekucki at some point, and we consider this as an issue in dapp-runner/yapapi. In such cases, the retries should be performed on a different provider (and not the faulty one again). If that’s not possible, the script should “fail fast” and exit due to lack of other offers to choose from.

Related Golem Portal issue: https://golemproject.atlassian.net/browse/SCT-582

Example logs:

[2023-08-15T06:40:19.245+0000 INFO dapp_runner.log] Using log file `/home/dapp/.local/share/dapp_manager/21b75e86806e4ab5a699000944e82d4e/log`
[2023-08-15T06:40:19.715+0000 WARNING yapapi.storage.gftp] Cannot parse gftp version info '0.12.2-close_wait (e31d28e2 2023-07-12 build #304)'
[2023-08-15T06:40:19.718+0000 INFO yapapi.network] Created network: Network { id: 57c90838a4df4ed0ac6a4adb4c0946a1, ip: 192.168.1.0, mask: 255.255.255.0}
[2023-08-15T06:40:19.854+0000 INFO dapp_runner.runner] Starting app: GLM Query, startup timeout: 0:05:00, maximum running time: 1:00:00
[2023-08-15T06:40:35.891+0000 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'big-lettuce' (0x6d7af1429060a3bfb52290f1d8e8bf3cdba91605)
[2023-08-15T06:40:37.253+0000 INFO yapapi.summary] Received proposals from 1 provider so far
[2023-08-15T06:40:46.988+0000 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'big-lettuce'
[2023-08-15T06:40:50.085+0000 INFO yapapi.summary] [Job 1] Terminated agreement with big-lettuce
[2023-08-15T06:40:55.261+0000 INFO yapapi.summary] Received proposals from 29 providers so far
[2023-08-15T06:40:55.365+0000 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'golem-hou-node-04' (0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb)
[2023-08-15T06:40:55.846+0000 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'golem-hou-node-04'
[2023-08-15T06:40:58.262+0000 INFO yapapi.summary] Received proposals from 69 providers so far
[2023-08-15T06:40:58.314+0000 INFO yapapi.services.service_runner] <DappService-backend starting on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ] @ 192.168.1.3> commissioned
[2023-08-15T06:40:59.603+0000 WARNING yapapi.services.service_runner] Unhandled exception in service
Traceback (most recent call last):
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 318, in _run_instance
    batch = batch_task.result()
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/dapp_runner/runner/service.py", line 68, in start
    yield script
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 334, in _run_instance
    fut_result = yield batch
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/engine.py", line 719, in process_batches
    results = await get_batch_results()
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/engine.py", line 700, in get_batch_results
    event = script.process_batch_event(event_class, event_kwargs)
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/script/__init__.py", line 95, in process_batch_event
    raise CommandExecutionError(str(command), event.message, event.stderr)
yapapi.rest.activity.CommandExecutionError: Command 'Start ()' failed on provider; message: 'ExeScript command exited with code 101'
[2023-08-15T06:40:59.605+0000 INFO yapapi.services.service_runner] <DappService-backend terminated on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ] @ 192.168.1.3> decommissioned
[2023-08-15T06:40:59.810+0000 INFO yapapi.summary] [Job 1] Terminated agreement with golem-hou-node-04
[2023-08-15T06:41:00.072+0000 INFO yapapi.services.service_runner] Restarting service <DappService-backend terminated on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ]>
[2023-08-15T06:41:01.082+0000 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'golem-hou-node-04' (0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb)
[2023-08-15T06:41:01.559+0000 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'golem-hou-node-04'
[2023-08-15T06:41:02.069+0000 INFO yapapi.summary] [Job 1] Accepted invoice from 'golem-hou-node-04', amount: 0.000000108647777778
[2023-08-15T06:41:03.258+0000 INFO yapapi.services.service_runner] <DappService-backend starting on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ] @ 192.168.1.3> commissioned
[2023-08-15T06:41:04.535+0000 WARNING yapapi.services.service_runner] Unhandled exception in service
Traceback (most recent call last):
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 318, in _run_instance
    batch = batch_task.result()
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/dapp_runner/runner/service.py", line 68, in start
    yield script
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 334, in _run_instance
    fut_result = yield batch
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/engine.py", line 719, in process_batches
    results = await get_batch_results()
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/engine.py", line 700, in get_batch_results
    event = script.process_batch_event(event_class, event_kwargs)
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/script/__init__.py", line 95, in process_batch_event
    raise CommandExecutionError(str(command), event.message, event.stderr)
yapapi.rest.activity.CommandExecutionError: Command 'Start ()' failed on provider; message: 'ExeScript command exited with code 101'
[2023-08-15T06:41:04.536+0000 INFO yapapi.services.service_runner] <DappService-backend terminated on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ] @ 192.168.1.3> decommissioned
[2023-08-15T06:41:04.739+0000 INFO yapapi.summary] [Job 1] Terminated agreement with golem-hou-node-04
[2023-08-15T06:41:04.958+0000 INFO yapapi.services.service_runner] Restarting service <DappService-backend terminated on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ]>
[2023-08-15T06:41:05.967+0000 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'golem-hou-node-04' (0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb)
[2023-08-15T06:41:06.347+0000 INFO yapapi.summary] [Job 1] Accepted invoice from 'golem-hou-node-04', amount: 0.000000021510833333
[2023-08-15T06:41:06.455+0000 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'golem-hou-node-04'
[2023-08-15T06:41:07.420+0000 INFO yapapi.services.service_runner] <DappService-backend starting on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ] @ 192.168.1.3> commissioned
[2023-08-15T06:41:08.689+0000 WARNING yapapi.services.service_runner] Unhandled exception in service
Traceback (most recent call last):
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 318, in _run_instance
    batch = batch_task.result()
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/dapp_runner/runner/service.py", line 68, in start
    yield script
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 334, in _run_instance
    fut_result = yield batch
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/engine.py", line 719, in process_batches
    results = await get_batch_results()
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/engine.py", line 700, in get_batch_results
    event = script.process_batch_event(event_class, event_kwargs)
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/script/__init__.py", line 95, in process_batch_event
    raise CommandExecutionError(str(command), event.message, event.stderr)
yapapi.rest.activity.CommandExecutionError: Command 'Start ()' failed on provider; message: 'ExeScript command exited with code 101'
[2023-08-15T06:41:08.690+0000 INFO yapapi.services.service_runner] <DappService-backend terminated on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ] @ 192.168.1.3> decommissioned
[2023-08-15T06:41:08.906+0000 INFO yapapi.summary] [Job 1] Terminated agreement with golem-hou-node-04
[2023-08-15T06:41:09.124+0000 INFO yapapi.services.service_runner] Restarting service <DappService-backend terminated on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ]>
[2023-08-15T06:41:09.611+0000 INFO yapapi.summary] [Job 1] Accepted invoice from 'golem-hou-node-04', amount: 0.000000016400000000
[2023-08-15T06:41:10.135+0000 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'golem-hou-node-04' (0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb)
[2023-08-15T06:41:10.600+0000 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'golem-hou-node-04'
[2023-08-15T06:41:11.559+0000 INFO yapapi.services.service_runner] <DappService-backend starting on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ] @ 192.168.1.3> commissioned
[2023-08-15T06:41:12.826+0000 WARNING yapapi.services.service_runner] Unhandled exception in service
Traceback (most recent call last):
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 318, in _run_instance
    batch = batch_task.result()
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/dapp_runner/runner/service.py", line 68, in start
    yield script
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 334, in _run_instance
    fut_result = yield batch
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/engine.py", line 719, in process_batches
    results = await get_batch_results()
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/engine.py", line 700, in get_batch_results
    event = script.process_batch_event(event_class, event_kwargs)
  File "/home/dapp/dapp-manager/lib/python3.10/site-packages/yapapi/script/__init__.py", line 95, in process_batch_event
    raise CommandExecutionError(str(command), event.message, event.stderr)
yapapi.rest.activity.CommandExecutionError: Command 'Start ()' failed on provider; message: 'ExeScript command exited with code 101'
[2023-08-15T06:41:12.827+0000 INFO yapapi.services.service_runner] <DappService-backend terminated on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ] @ 192.168.1.3> decommissioned
[2023-08-15T06:41:13.030+0000 INFO yapapi.summary] [Job 1] Terminated agreement with golem-hou-node-04
[2023-08-15T06:41:13.233+0000 INFO yapapi.services.service_runner] Restarting service <DappService-backend terminated on golem-hou-node-04 [ 0xa72ac3f5211d6d659c3a7cee5a46aa1583c21bbb ]>
[2023-08-15T06:41:13.912+0000 INFO yapapi.summary] [Job 1] Accepted invoice from 'golem-hou-node-04', amount: 0.000000016066388889
shadeofblue commented 1 year ago

@grisha87 could you provide full, debug-level log file here?

grisha87 commented 1 year ago

@shadeofblue , unfortunately such is not available in our production environment (we don't use debug level there). We could setup staging to use mainnet instead of testnet and then you could try to reproduce it (but this will require time). One alternative you possibly have would be using dapp-manager/dapp-runner on mainnet a few times to run into this issue.

grisha87 commented 1 year ago

Hey @shadeofblue , wen do you plan to release the updated version of dapp-runner?