golemfactory / yapapi

Python high-level API for Golem.
https://yapapi.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
48 stars 23 forks source link

yagna: "Allocation [...] not found. Release failed." -> yapapi: "Error when shutting down Golem engine: (404)" #923

Open etam opened 2 years ago

etam commented 2 years ago

Description

OS: ubuntu 18.04 yagna daemon version: 0.10.0 Python version: 3.6.9 yapapi library version: 0.9.0

Description of the issue: Computation fails with

[2022-04-05T15:38:09.425+0000 INFO yapapi.executor] All jobs have finished
[2022-04-05T15:38:09.425+0000 INFO yapapi.executor] 135 agreements still unpaid, waiting for invoices...
[2022-04-05T15:38:39.427+0000 WARNING yapapi.executor] Unpaid agreements: {'adb905298e386e8445a7c882a5e762f46da0cbf7f905107b1af74e620ea41511', '349e51d9fd9228f9a3834cb61539087d23c6d49882398eeffbf69b5b85f63c97', '384ca0392b4282be3e92bd6faeed41707d624036cfea9645ef302623101
[2022-04-05T15:38:39.428+0000 INFO yapapi.executor] Waiting for Golem services to finish...
[2022-04-05T15:38:39.467+0000 INFO yapapi.summary] Total cost: 0
[2022-04-05T15:38:39.468+0000 ERROR yapapi.summary] Error when shutting down Golem engine: (404)
Reason: Not Found
HTTP response headers: <CIMultiDictProxy('Content-Length': '16', 'Content-Type': 'application/json', 'Date': 'Tue, 05 Apr 2022 15:38:39 GMT')>
HTTP response body: {"message":null}

yapapi version: 0.9.0
Using subnet: public-beta, payment driver: erc20, and network: polygon

Task computed: Task(id=8, data=TaskData(i=7, output_dir=PosixPath('[...]'), task_time=datetime.timedelta(0, 3600))), time: 1:00:02.270346
Task computed: Task(id=2, data=TaskData(i=1, output_dir=PosixPath('[...]'), task_time=datetime.timedelta(0, 3600))), time: 1:00:02.100750
tasks computed: 2, total time: 2:14:08.739454
Traceback (most recent call last):
  File "drone/run.py", line 243, in <module>
    main()
  File "drone/run.py", line 223, in main
    loop.run_until_complete(task)
  File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
    return future.result()
  File "drone/run.py", line 188, in async_main
    print(f"tasks computed: {num_tasks_computed}, "
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/yapapi/golem.py", line 291, in __aexit__
    return await self._stop_with_exc_info(*exc_info)
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/yapapi/golem.py", line 295, in _stop_with_exc_info
    res = await self._engine.stop(*exc_info)
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/yapapi/engine.py", line 222, in stop
    return await self._stack.__aexit__(None, None, None)
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/async_exit_stack/_async_exit_stack.py", line 208, in __aexit__
    raise exc_details[1]
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/async_exit_stack/_async_exit_stack.py", line 191, in __aexit__
    cb_suppress = await cb(*exc_details)
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/yapapi/rest/payment.py", line 111, in __aexit__
    await self._api.release_allocation(self._id)
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/ya_payment/api_client.py", line 206, in __call_api
    raise e
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/ya_payment/api_client.py", line 202, in __call_api
    _request_timeout=_request_timeout,
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/ya_payment/rest.py", line 255, in DELETE
    body=body,
  File "/home/golem-dev0/releases/yapapi-0.9.0/lib/python3.6/site-packages/ya_payment/rest.py", line 180, in request
    raise ApiException(http_resp=r)
ya_payment.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: <CIMultiDictProxy('Content-Length': '16', 'Content-Type': 'application/json', 'Date': 'Tue, 05 Apr 2022 15:38:39 GMT')>
HTTP response body: {"message":null}

At the same time, yagna says

[2022-04-05T15:38:39.460+0000 WARN  ya_payment::dao::allocation] Allocation [...] not found. Release failed.

Actual result: I already wrote it.

Steps To Reproduce

Run the example drone.py, but with long (1h) tasks.

Expected behavior

It should not fail with tracebacks.

Logs and any additional context

logs.zip

wkargul commented 2 years ago

When analysing logs please bear in mind that "vague log" -> [2022-04-05T15:38:39.460+0000 WARN ya_payment::dao::allocation] Allocation 979bf31e-6361-4b53-ac88-3fa86a936abc not found. Release failed.

might be the result of allocation release due to timeout of the allocation set when it was created.

Allocation was already released when its timeout passed -> [2022-04-05T13:54:00.631+0000 INFO ya_payment::dao::allocation] Allocation 979bf31e-6361-4b53-ac88-3fa86a936abc released.

This "vague log" issue has been detected and resolved here https://github.com/golemfactory/yagna/issues/1885 (check task for more details). I hope the implemented solution will be released with the nearest patch. I don't know if this has any effect on yapapi error which you described above.

shadeofblue commented 2 years ago

so, most likely a matter of catching the relevant 404 and just maybe issuing a warning to the log...

plus, it might warrant an integration test...

nieznanysprawiciel commented 2 years ago

We are going to change error code to HTTP Gone (410) in case allocation was already released https://github.com/golemfactory/yagna/issues/1931

filipgolem commented 2 years ago

Fix: https://github.com/golemfactory/yapapi/pull/925 @nieznanysprawiciel This fix ignores both 404 and 410.

filipgolem commented 2 years ago

PR that fixes allocation timeout: https://github.com/golemfactory/yapapi/pull/926

shadeofblue commented 2 years ago

PR that fixes allocation timeout: #926

it does not fix the issue, it's a work-around...

tworec commented 2 years ago

What is the proper solution here?

wkargul commented 2 years ago

In regards to changes in yagna: for those already released allocations we will be returning http gone (410) for any subsequent get/release on them.

filipgolem commented 2 years ago

What is the proper solution here?

yapapi should create a new allocation after yagna releases the old allocation (if the budget was not spent). This functionality can be used to add spending limits, i.e. instead of total budget for computation, the user can set budget per hour.