Closed jalas167 closed 1 year ago
Investigation result (meeting 14.04 with @grisha87 @lucekdudek and @nieznanysprawiciel ): Problem can be reproduced by creating 2 applications at the same time. After one application is closed, both networks disappear. Yagna internally during removing network clears ownership map, but instead of removing only one network, it removes whole key for specific identity (But network itself isn't removed). Following calls referencing this network will fail, because identity is not the owner of the network
RC with fixes is here: https://github.com/golemfactory/yagna/releases/tag/pre-rel-v0.13.0-rc3
@SDK , we have an interesting case in production observed by @broadcastmonkey Paweł Burgchardt , the application is currently running on Golem (according to dapp-manager) - https://291f9f8df0964af8854dfe733ab8b8f2.portal.golem.network/
The app was deployed, not used for 30+ minutes, and then the link was used. Leads to HTTP 500 on for the users. I took a look on the stderr of this app and I got the following information (next message).
Can you please elaborate on what's wrong?
The timeline (referring to the timestamps in the logs)
2023-04-12T08:13:20.285+0000
- (From requester) The provider gets commissioned2023-04-12T08:45:09.808+0000
- (From requester) Paweł tries to access the app after 32 min delay, gets aHTTP 500
and we observe theaiohttp.client_exceptions.WSServerHandshakeError: 404, message='Invalid response status', url=URL('ws://127.0.0.1:7465/net-api/v1/net/79e859796e5c40f5a59657d8bc5badd2/tcp/192.168.0.4/80')
issue in the logs.2023-04-12T09:13:24.764+0000
- the app gets terminated after 1h, exposes the fact that the network was already removed[2023-04-12T09:13:24.764+0000 WARNING yapapi.network] Tried removing a network which doesn't exist. network_id=79e859796e5c40f5a59657d8bc5badd2
The logs
Related logs on the provider:
https://yastats.golem.network/explore?orgId=1&left=%7B%22datasource%22:%22iT4zPPcGz%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22iT4zPPcGz%22%7D,%22editorMode%22:%22code%22,%22expr%22:%22%7Bhostname%3D%5C%22fractal_01_0.h%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%221681287145000%22,%22to%22:%221681290805000%22%7D%7D
Related logs on the requestor:
https://yastats.golem.network/explore?orgId=1&left=%7B%22datasource%22:%22iT4zPPcGz%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22iT4zPPcGz%22%7D,%22editorMode%22:%22code%22,%22expr%22:%22%7Bhostname%3D%5C%22portal.golem.network%5C%22,%20service%3D%5C%22yagna%5C%22%7D%20%21%3D%20%5C%22driver::cron%5C%22%20%21%3D%20%5C%22erc20::wallet%5C%22%20%21%3D%20%5C%22not%20responding%20to%20ping%5C%22%20%21%3D%20%5C%22ya_market::negotiation%5C%22%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%221681287200000%22,%22to%22:%221681290804000%22%7D%7D
PLEASE NOTE THAT THIS IS THE GOLEM PORTAL REQUESTOR, SO YOU WILL SEE LOGS FROM ALL ACTIVITY, NOT ONLY THE ONE AFFECTED.
Within this timeframe I was able to find one interesting entry:
But it's not the one which was used by Paweł's instance. So I decided to check the VPN related logs:
Here you can see that
79e859796e5c40f5a59657d8bc5badd2
has been created2023-04-12T08:12:28.082+0000
and it was never stopped (at least according to yagna's logs).Using this information I checked back the provider logs and was able to find:
This looks like the VPN connection establishing really failed? Another line in the provider logs points to, so it seems that the VPN was actually working?