golemfactory / yagna-triage

Repository for issues which we don't yet know about enough to assign to proper repo
1 stars 0 forks source link

Most/all agreements fail when running against devnet-beta #196

Closed ederenn closed 2 years ago

ederenn commented 2 years ago

Name: blue yagna version: yagna 0.10.0-rc15 (160bc5a1 2022-03-14 build #206) OS+lang+version (if applicable): mac + Python 3.9.7 + yapapi 0.9.0a1

$ python blender.py
[2022-03-15T12:47:09.599+0100 INFO yapapi] Using log file `blender-yapapi-2022-03-15_12.47.09.log`; in case of errors look for additional information there
yapapi version: 0.9.0a1
Using subnet: devnet-beta, payment driver: erc20, and network: rinkeby

[2022-03-15T12:47:12.022+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'etam'
[2022-03-15T12:47:12.276+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'etam'
[2022-03-15T12:47:12.701+0100 INFO yapapi.summary] Received proposals from 6 providers so far
[2022-03-15T12:47:14.294+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'omx-testnode1'
[2022-03-15T12:47:14.665+0100 INFO yapapi.summary] [Job 1] Task started on provider 'etam', task data: 0
[2022-03-15T12:47:15.706+0100 INFO yapapi.summary] Received proposals from 12 providers so far
[2022-03-15T12:47:17.821+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'omx-testnode1'
[2022-03-15T12:47:19.833+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'bristlecone'
[2022-03-15T12:47:20.181+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'bristlecone'
[2022-03-15T12:47:24.379+0100 INFO yapapi.summary] [Job 1] Terminated agreement with omx-testnode1
[2022-03-15T12:47:24.388+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'sycamore'
[2022-03-15T12:47:27.671+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'sycamore'
[2022-03-15T12:47:27.721+0100 INFO yapapi.summary] Received proposals from 13 providers so far
Task computed: Task(id=1, data=0), result: output_0.png, time: 0:00:13.781179
[2022-03-15T12:47:28.454+0100 INFO yapapi.summary] [Job 1] Task finished by provider 'etam', task data: 0
[2022-03-15T12:47:28.455+0100 INFO yapapi.summary] [Job 1] Task started on provider 'etam', task data: 10
[2022-03-15T12:47:29.744+0100 INFO yapapi.summary] [Job 1] Terminated agreement with bristlecone
[2022-03-15T12:47:29.757+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'omx-testnode1'
[2022-03-15T12:47:33.297+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'omx-testnode1'
[2022-03-15T12:47:35.370+0100 INFO yapapi.summary] [Job 1] Terminated agreement with sycamore
[2022-03-15T12:47:35.380+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'glen'
[2022-03-15T12:47:38.673+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'glen'
[2022-03-15T12:47:40.860+0100 INFO yapapi.summary] [Job 1] Terminated agreement with omx-testnode1
Task computed: Task(id=2, data=10), result: output_10.png, time: 0:00:13.875980
[2022-03-15T12:47:42.323+0100 INFO yapapi.summary] [Job 1] Task finished by provider 'etam', task data: 10
[2022-03-15T12:47:42.323+0100 INFO yapapi.summary] [Job 1] Task started on provider 'etam', task data: 20
[2022-03-15T12:47:42.874+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'michal'
[2022-03-15T12:47:46.115+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'michal'
[2022-03-15T12:47:48.196+0100 INFO yapapi.summary] [Job 1] Terminated agreement with glen
[2022-03-15T12:47:48.208+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'omx-testnode1'
[2022-03-15T12:47:51.783+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'omx-testnode1'
[2022-03-15T12:47:53.903+0100 INFO yapapi.summary] [Job 1] Terminated agreement with michal
[2022-03-15T12:47:53.920+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'bristlecone'
Task computed: Task(id=3, data=20), result: output_20.png, time: 0:00:13.723595
[2022-03-15T12:47:56.048+0100 INFO yapapi.summary] [Job 1] Task finished by provider 'etam', task data: 20
[2022-03-15T12:47:56.048+0100 INFO yapapi.summary] [Job 1] Task started on provider 'etam', task data: 30
[2022-03-15T12:47:57.262+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'bristlecone'
[2022-03-15T12:47:59.484+0100 INFO yapapi.summary] [Job 1] Terminated agreement with omx-testnode1
[2022-03-15T12:47:59.500+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'michal'
[2022-03-15T12:48:02.742+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'michal'
[2022-03-15T12:48:04.843+0100 INFO yapapi.summary] [Job 1] Terminated agreement with bristlecone
[2022-03-15T12:48:08.976+0100 INFO yapapi.summary] [Job 1] Terminated agreement with michal
[2022-03-15T12:48:08.989+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'omx-testnode1'
Task computed: Task(id=4, data=30), result: output_30.png, time: 0:00:13.761870
[2022-03-15T12:48:09.813+0100 INFO yapapi.summary] [Job 1] Task finished by provider 'etam', task data: 30
[2022-03-15T12:48:09.814+0100 INFO yapapi.summary] [Job 1] Task started on provider 'etam', task data: 40
[2022-03-15T12:48:12.532+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'omx-testnode1'
[2022-03-15T12:48:14.546+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'michal'
[2022-03-15T12:48:17.865+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'michal'
[2022-03-15T12:48:20.058+0100 INFO yapapi.summary] [Job 1] Terminated agreement with omx-testnode1
[2022-03-15T12:48:20.077+0100 INFO yapapi.summary] [Job 1] Agreement proposed to provider 'q53'
[2022-03-15T12:48:23.278+0100 INFO yapapi.summary] [Job 1] Agreement confirmed by provider 'q53'
Task computed: Task(id=5, data=40), result: output_40.png, time: 0:00:13.721526
[2022-03-15T12:48:23.530+0100 INFO yapapi.summary] [Job 1] Task finished by provider 'etam', task data: 40
[2022-03-15T12:48:23.531+0100 INFO yapapi.summary] [Job 1] Task started on provider 'etam', task data: 50
[2022-03-15T12:48:25.364+0100 INFO yapapi.summary] [Job 1] Terminated agreement with michal
Task computed: Task(id=6, data=50), result: output_50.png, time: 0:00:13.706605
[2022-03-15T12:48:37.238+0100 INFO yapapi.summary] [Job 1] Task finished by provider 'etam', task data: 50
[2022-03-15T12:48:38.242+0100 INFO yapapi.executor] [Job 1] Waiting for 2 workers to finish...
[2022-03-15T12:48:42.507+0100 INFO yapapi.executor] [Job 1] Waiting for Executor services to finish...
6 tasks computed, total time: 0:01:32.510203
[2022-03-15T12:48:42.508+0100 INFO yapapi.executor] Golem is shutting down...
[2022-03-15T12:48:42.509+0100 INFO yapapi.summary] [Job 1] Job finished in 92.5s
[2022-03-15T12:48:42.510+0100 INFO yapapi.summary] [Job 1] Negotiated 13 agreements with 7 providers
[2022-03-15T12:48:42.510+0100 INFO yapapi.summary] [Job 1] Provider 'etam' computed 6 tasks
[2022-03-15T12:48:42.510+0100 INFO yapapi.summary] [Job 1] Provider 'omx-testnode1' did not compute any tasks
[2022-03-15T12:48:42.510+0100 INFO yapapi.summary] [Job 1] Provider 'glen' did not compute any tasks
[2022-03-15T12:48:42.511+0100 INFO yapapi.summary] [Job 1] Provider 'michal' did not compute any tasks
[2022-03-15T12:48:42.511+0100 INFO yapapi.summary] [Job 1] Provider 'sycamore' did not compute any tasks
[2022-03-15T12:48:42.511+0100 INFO yapapi.summary] [Job 1] Provider 'michal' did not compute any tasks
[2022-03-15T12:48:42.511+0100 INFO yapapi.summary] [Job 1] Provider 'michal' did not compute any tasks
[2022-03-15T12:48:42.511+0100 INFO yapapi.summary] [Job 1] Provider 'omx-testnode1' did not compute any tasks
[2022-03-15T12:48:42.512+0100 INFO yapapi.summary] [Job 1] Provider 'omx-testnode1' did not compute any tasks
[2022-03-15T12:48:42.512+0100 INFO yapapi.summary] [Job 1] Provider 'omx-testnode1' did not compute any tasks
[2022-03-15T12:48:42.512+0100 INFO yapapi.summary] [Job 1] Provider 'bristlecone' did not compute any tasks
[2022-03-15T12:48:42.512+0100 INFO yapapi.summary] [Job 1] Provider 'q53' did not compute any tasks
[2022-03-15T12:48:42.512+0100 INFO yapapi.summary] [Job 1] Provider 'bristlecone' did not compute any tasks
[2022-03-15T12:48:42.512+0100 INFO yapapi.executor] All jobs have finished
[2022-03-15T12:48:42.513+0100 INFO yapapi.executor] 2 agreements still unpaid, waiting for invoices...
[2022-03-15T12:48:44.353+0100 INFO yapapi.summary] [Job 1] Accepted invoice from 'etam', amount: 0.01080197226300000
[2022-03-15T12:49:12.515+0100 WARNING yapapi.executor] Unpaid agreements: {'c26d964bb44c053e616dc68349851a0d98553d8939fbc05bf3e7543c6f44bd09'}
[2022-03-15T12:49:12.516+0100 INFO yapapi.executor] Waiting for Golem services to finish...
[2022-03-15T12:49:12.537+0100 INFO yapapi.summary] Total cost: 0.010801972263
[2022-03-15T12:49:12.537+0100 INFO yapapi.summary] Golem engine has shut down

yagna_rCURRENT (6).log blender-yapapi-2022-03-15_12.47.09.log blender-yapapi-2022-03-15_12.39.26.log

johny-b commented 2 years ago

You've got a lot of ActivityCreateFailed because of a timeout there. I've already reported similar problem on 2022-02-28 (although there's no issue for it).

nieznanysprawiciel commented 2 years ago

We need to find out, if there is any interference between devnet Providers. Activity creation fails in case, when ExeUnit starts to long

mfranciszkiewicz commented 2 years ago

The issue is being closed, to be re-opened when necessary.

I have no longer encountered the issue since devnet providers have been restarted.

Before then, the investigation showed that the providers were operating under degraded performance conditions. The prevailing issue were the long database access times, which delayed each action taken by the provider agent (e.g. activity creation). Since the provider restart, further investigation was hampered and lead to examination of the following potential problems and symptoms:

Currently, provider nodes are behaving correctly.

etam commented 2 years ago

@mfranciszkiewicz It's happening again.

johny-b commented 2 years ago

Side note, but maybe by any chance this is important:

Goth tests are becoming less and less stable in the recent days (e.g. yapapi nightly), and I don't have any idea what might be the cause, as there are not much code changes.

Maybe this is somehow connected? Like e.g. "For some unknown reason the performance of the central image repository degrades in time, and because of this:

Again, I don't have any good reason to believe this is the case, except that there are two weird things happening at the same time, so maybe there's a single weirdness behind them :)

etam commented 2 years ago

Screenshot_20220511_112534

etam commented 2 years ago

It didn't happen since 0.10.1 release. @mfranciszkiewicz is it possible that it's fixed there?

EvilSeeQu-sys commented 2 years ago

Ping Marek F

mfranciszkiewicz commented 2 years ago

@etam @EvilSeeQu-sys There was no specific fix targeting this issue, although it's possible that it has been fixed. I'm closing this issue, to be re-opened when necessary.