golemfactory / yagna

An open platform and marketplace for distributed computations
GNU General Public License v3.0
399 stars 61 forks source link

NET - packet timeouts #2148

Open mfranciszkiewicz opened 2 years ago

mfranciszkiewicz commented 2 years ago

Common error found in logs while executing ya-perf:

[2022-08-05T15:49:23.941+0200 ERROR yapapi.services.service_runner] Unhandled engine error
Traceback (most recent call last):
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 279, in _run_instance
    fut_result = yield batch
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/yapapi/engine.py", line 699, in process_batches
    remote = await activity.send(batch, deadline=batch_deadline)
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/yapapi/rest/activity.py", line 86, in send
    batch_id = await self._api.call_exec(self._id, yaa.ExeScriptRequest(text=script_txt))
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/ya_activity/api_client.py", line 205, in __call_api
    raise e
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/ya_activity/api_client.py", line 193, in __call_api
    response_data = await self.request(
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/ya_activity/rest.py", line 268, in POST
    return await self.request(
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/ya_activity/rest.py", line 180, in request
    raise ApiException(http_resp=r)
ya_activity.exceptions.ApiException: (408)
Reason: Request Timeout
HTTP response headers: <CIMultiDictProxy('Content-Length': '21', 'Content-Type': 'application/json', 'Date': 'Fri, 05 Aug 2022 13:49:23 GMT')>
HTTP response body: {"message":"Timeout"}
nieznanysprawiciel commented 2 years ago

This is the same bug as: https://github.com/golemfactory/yagna/issues/2148 And Marek's conclusions from investigation (priv) are the same as my conclusions, when I was investigating previous issue

nieznanysprawiciel commented 2 years ago

Analysis made by @mfranciszkiewicz ya-perf

[2022-08-07T14:29:18.443+0200 INFO yapapi.utils] Provider: office_8_219.h. 🚀 Starting transfer test. 
[2022-08-07T14:29:24.526+0200 ERROR yapapi.services.service_runner] Unhandled engine error
Traceback (most recent call last):
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/yapapi/services/service_runner.py", line 279, in _run_instance
    fut_result = yield batch
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/yapapi/engine.py", line 699, in process_batches
    remote = await activity.send(batch, deadline=batch_deadline)
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/yapapi/rest/activity.py", line 86, in send
    batch_id = await self._api.call_exec(self._id, yaa.ExeScriptRequest(text=script_txt))
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/ya_activity/api_client.py", line 205, in __call_api
    raise e
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/ya_activity/api_client.py", line 193, in __call_api
    response_data = await self.request(
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/ya_activity/rest.py", line 268, in POST
    return await self.request(
  File "/home/mf/Projects/ya-perf/venv/lib/python3.10/site-packages/ya_activity/rest.py", line 180, in request
    raise ApiException(http_resp=r)
ya_activity.exceptions.ApiException: (408)
Reason: Request Timeout
HTTP response headers: <CIMultiDictProxy('Content-Length': '21', 'Content-Type': 'application/json', 'Date': 'Sun, 07 Aug 2022 12:29:24 GMT')>
HTTP response body: {"message":"Timeout"}

[2022-08-07T14:29:24.527+0200 INFO yapapi.services.service_runner] <PerformanceService terminated on office_8_219.h [ 0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73 ] @ 192.168.0.11> decommissioned

ya-provider

[2022-08-07T12:29:18.627+0000 DEBUG ya_net::hybrid::service] Handle request 3938222459762543 to /net/0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73/exeunit/5e38848c786e459a9ad8d3f8fc19ecf8/Exec from 0xc63d5c103d64d00e06ce8b624643a590666cb923
[2022-08-07T12:29:18.628+0000 DEBUG ya_net::hybrid::service] Handle request 5852474253408954: reply chunk (89 B)
[2022-08-07T12:29:18.628+0000 DEBUG ya_net::hybrid::service] Handle request 3938222459762543: reply chunk (73 B)
[2022-08-07T12:29:18.674+0000 DEBUG ya_net::hybrid::service] Local bus handler (0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73 -> 0xc63d5c103d64d00e06ce8b624643a590666cb923), address: /net/0xc63d5c103d64d00e06ce8b624643a590666cb923/gftp/7e1839fd5b1f59802cdf1f098dd5198e49b2a242ec43a5e2f107d2e2e57b0f25/GetMetadata, id: 4302352500922899 -> send message to remote (206 B)
[2022-08-07T12:29:18.694+0000 DEBUG smoltcp::socket::tcp] tcp:7698:e9ef:4007:e2e1:f310:c19c:72a0:1dbf:64341:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:1: received duplicate ACK for seq 3953180533 (duplicate nr 1)
[2022-08-07T12:29:18.700+0000 DEBUG smoltcp::socket::tcp] tcp:7698:e9ef:4007:e2e1:f310:c19c:72a0:1dbf:64341:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:1: received duplicate ACK for seq 3953180533 (duplicate nr 2)
[2022-08-07T12:29:19.281+0000 DEBUG ya_activity::dao::activity_state] getting activity state
[2022-08-07T12:29:20.152+0000 DEBUG actix_web::middleware::logger] Error in response: Timeout
[2022-08-07T12:29:20.152+0000 INFO  actix_web::middleware::logger] 127.0.0.1 "GET /activity-api/v1/events?afterTimestamp=2022-08-07T12%3A29%3A10.515943987%2B00%3A00&appSessionId=office_8_219.h-1294022&timeout=3 HTTP/1.1" 408 21 "-" "-" 3.001394
[2022-08-07T12:29:20.283+0000 DEBUG ya_activity::dao::activity_state] getting activity state
[2022-08-07T12:29:21.285+0000 DEBUG ya_activity::dao::activity_state] getting activity state
[2022-08-07T12:29:21.417+0000 DEBUG ya_activity::api::common] get_activity_state_web
[2022-08-07T12:29:21.418+0000 DEBUG ya_activity::dao::activity_state] getting activity state
[2022-08-07T12:29:21.418+0000 INFO  actix_web::middleware::logger] 127.0.0.1 "GET /activity-api/v1/activity/5e38848c786e459a9ad8d3f8fc19ecf8/state HTTP/1.1" 200 27 "-" "-" 0.000693
[2022-08-07T12:29:22.287+0000 DEBUG ya_activity::dao::activity_state] getting activity state
[2022-08-07T12:29:22.648+0000 DEBUG smoltcp::socket::tcp] tcp:7698:e9ef:4007:e2e1:f310:c19c:72a0:1dbf:1:c8e6:15ed:79ba:51ac:778:4e35:16eb:405:47705: segment not in receive window (369126560..369126561 not intersecting 369126561..373320865), will send challenge ACK
[2022-08-07T12:29:22.648+0000 DEBUG smoltcp::socket::tcp] sending sACK option with current assembler ranges
[2022-08-07T12:29:22.648+0000 DEBUG smoltcp::socket::tcp] tcp:7698:e9ef:4007:e2e1:f310:c19c:72a0:1dbf:32645:c8e6:15ed:79ba:51ac:778:4e35:16eb:405:1: segment not in receive window (1828851718..1828851719 not intersecting 1828851719..1833046023), will send challenge ACK
[2022-08-07T12:29:23.290+0000 DEBUG ya_activity::dao::activity_state] getting activity state
[2022-08-07T12:29:23.424+0000 DEBUG smoltcp::socket::tcp] tcp:7698:e9ef:4007:e2e1:f310:c19c:72a0:1dbf:1:f03c:be04:92c2:e7fb:83dd:8ddb:98f7:af94:49030: segment not in receive window (3941291969..3941291970 not intersecting 3941291970..3945486274), will send challenge ACK
[2022-08-07T12:29:23.424+0000 DEBUG smoltcp::socket::tcp] sending sACK option with current assembler ranges
[2022-08-07T12:29:23.424+0000 DEBUG smoltcp::socket::tcp] tcp:7698:e9ef:4007:e2e1:f310:c19c:72a0:1dbf:64301:f03c:be04:92c2:e7fb:83dd:8ddb:98f7:af94:1: segment not in receive window (806563356..806563357 not intersecting 806563357..810757661), will send challenge ACK
[2022-08-07T12:29:23.690+0000 DEBUG ya_sb_router::connection] [127.0.0.1:59648] no data for: 5.062455187s, sending ping (buffer=0)
[2022-08-07T12:29:24.154+0000 DEBUG actix_web::middleware::logger] Error in response: Timeout
[2022-08-07T12:29:24.154+0000 INFO  actix_web::middleware::logger] 127.0.0.1 "GET /activity-api/v1/events?afterTimestamp=2022-08-07T12%3A29%3A10.515943987%2B00%3A00&appSessionId=office_8_219.h-1294022&timeout=3 HTTP/1.1" 408 21 "-" "-" 3.001770
[2022-08-07T12:29:24.292+0000 DEBUG ya_activity::dao::activity_state] getting activity state
[2022-08-07T12:29:24.540+0000 DEBUG ya_net::hybrid::service] Handle request 7505208452886265 to /net/0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73/market/protocol/mk1/negotiation/provider/agreement/AgreementTerminated from 0xc63d5c103d64d00e06ce8b624643a590666cb923
[2022-08-07T12:29:24.541+0000 DEBUG ya_market::protocol::negotiation::provider] Negotiation API: Agreement [P-d3b48813e1d2e54d6ca415c715c41737884f3710d0394795e8b862c2a59e2ceb] terminated by [0xc63d5c103d64d00e06ce8b624643a590666cb923].
[2022-08-07T12:29:24.541+0000 DEBUG ya_market::db::dao::agreement_events] Event timestamp: 2022-08-07 12:29:10.489919406, type: Approved
[2022-08-07T12:29:24.542+0000 DEBUG ya_market::db::dao::agreement_events] Event timestamp: 2022-08-07 12:29:24.541822443, type: Terminated
[2022-08-07T12:29:24.542+0000 INFO  ya_market::negotiation::common] Received terminate Agreement [P-d3b48813e1d2e54d6ca415c715c41737884f3710d0394795e8b862c2a59e2ceb] from [0xc63d5c103d64d00e06ce8b624643a590666cb923]. Reason: 'Work cancelled'

exe-unit

[2022-08-07T12:29:13.307+0000 DEBUG ya_runtime_api::server::client] sending request: Request { id: 6, command: Some(Run(RunProcess { bin: "/bin/bash", args: ["bash", "-c", "truncate -s 1M /golem/dummy.dat"], work_dir: "", stdout: None, stderr: None })) }
[2022-08-07T12:29:13.307+0000 DEBUG ya_runtime_api::server::client] waiting for response
[2022-08-07T12:29:13.308+0000 DEBUG ya_runtime_api::server::client] got response: Response { event: false, id: 6, command: Some(Run(RunProcess { pid: 3 })) }
[2022-08-07T12:29:13.314+0000 DEBUG ya_exe_unit::handlers::local] Entering state: StatePair(Ready, None)
[2022-08-07T12:29:13.314+0000 DEBUG ya_exe_unit::handlers::local] Report: {"batches_done":1,"batches_pending":1,"cmds_done":4,"cmds_pending":1}
[2022-08-07T12:29:18.628+0000 DEBUG ya_exe_unit::handlers::local] Entering state: StatePair(Ready, Some(Ready))
[2022-08-07T12:29:18.628+0000 DEBUG ya_exe_unit::handlers::local] Report: {"batches_done":2,"batches_pending":1,"cmds_done":5,"cmds_pending":1}
[2022-08-07T12:29:18.673+0000 INFO  ya_exe_unit] Executing command: Transfer { from: "gftp://0xc63d5c103d64d00e06ce8b624643a590666cb923/7e1839fd5b1f59802cdf1f098dd5198e49b2a242ec43a5e2f107d2e2e57b0f25", to: "container:/golem/output/dummy", args: TransferArgs { format: None, depth: None, fileset: None } }
[2022-08-07T12:29:18.673+0000 INFO  ya_exe_unit::service::transfer] Transferring Url { scheme: "gftp", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("0xc63d5c103d64d00e06ce8b624643a590666cb923")), port: None, path: "/7e1839fd5b1f59802cdf1f098dd5198e49b2a242ec43a5e2f107d2e2e57b0f25", query: None, fragment: None } to Url { scheme: "container", cannot_be_a_base: false, username: "", password: None, host: None, port: None, path: "/golem/output/dummy", query: None, fragment: None }
[2022-08-07T12:29:18.673+0000 DEBUG ya_transfer] Transferring from offset: 0
[2022-08-07T12:29:18.673+0000 DEBUG ya_transfer::file] Transferring to file: /home/golem-dev/hybrid/provider_datadir/exe-unit/work/d3b48813e1d2e54d6ca415c715c41737884f3710d0394795e8b862c2a59e2ceb/5e38848c786e459a9ad8d3f8fc19ecf8/vol-64e62a2f-e2a3-4f2b-8dad-823f55ad9328/dummy
[2022-08-07T12:29:24.543+0000 INFO  ya_exe_unit::service::signal] Caught signal: 15
[2022-08-07T12:29:24.543+0000 INFO  ya_exe_unit::handlers::local] Shutting down: Interrupted by signal: 15: {"batches_done":2,"batches_pending":1,"cmds_done":5,"cmds_pending":1}
[2022-08-07T12:29:24.543+0000 DEBUG ya_exe_unit::handlers::local] Entering state: StatePair(Ready, Some(Terminated))
[2022-08-07T12:29:24.543+0000 DEBUG ya_exe_unit::handlers::local] Report: {"batches_done":2,"batches_pending":1,"cmds_done":5,"cmds_pending":1}
[2022-08-07T12:29:24.545+0000 INFO  ya_exe_unit::runtime::process] Shutting down the runtime process: Interrupted(0)
[2022-08-07T12:29:24.545+0000 INFO  ya_exe_unit::network::vpn] [vpn] shutting down: Finished
[2022-08-07T12:29:24.545+0000 INFO  ya_exe_unit::network::vpn] [vpn] stopping service
[2022-08-07T12:29:24.545+0000 DEBUG ya_runtime_api::server::client] sending request: Request { id: 7, command: Some(Shutdown(Shutdown)) }
[2022-08-07T12:29:24.545+0000 DEBUG ya_runtime_api::server::client] waiting for response
[2022-08-07T12:29:24.592+0000 DEBUG ya_runtime_api::server::client] got response: Response { event: false, id: 7, command: Some(Shutdown(Shutdown)) }
[2022-08-07T12:29:24.592+0000 INFO  ya_exe_unit::service::transfer] Transfer service stopped
[2022-08-07T12:29:24.592+0000 INFO  ya_exe_unit::runtime::process] Shutting down the runtime process: Finished
[2022-08-07T12:29:24.592+0000 WARN  ya_exe_unit] Batch 550c548d9fa14efb9c2f2a6d6d1bf81a execution interrupted: Local service error: State error: Unexpected state: StatePair(Ready, Some(Terminated)), expected StatePair(Ready, Some(Ready))
[2022-08-07T12:29:24.592+0000 DEBUG ya_exe_unit::service::signal] Signal monitoring service stopped
[2022-08-07T12:29:24.592+0000 DEBUG ya_exe_unit::handlers::local] Entering state: StatePair(Terminated, None)
[2022-08-07T12:29:24.592+0000 DEBUG ya_exe_unit::handlers::local] Report: {"batches_done":3,"batches_pending":0,"cmds_done":6,"cmds_pending":0}
[2022-08-07T12:29:24.593+0000 WARN  ya_runtime_api::server::client] Unable to kill process 1618420: signal: 9
[2022-08-07T12:29:24.593+0000 WARN  ya_runtime_api::server::client] Unable to update process 1618420 status: receiver is gone
[2022-08-07T12:29:24.600+0000 INFO  ya_exe_unit::handlers::local] Shutdown process complete
[2022-08-07T12:29:24.600+0000 INFO  ya_exe_unit::runtime::process] Runtime handler stopped

Requestor chosen fragments

[2022-08-07T14:29:18.551+0200 DEBUG ya_net::hybrid::service] Local bus handler (0xc63d5c103d64d00e06ce8b624643a590666cb923 -> 0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73), address: /net/0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73/exeunit/5e38848c786e459a9ad8d3f8fc19ecf8/Exec, id: 3938222459762543 -> send message to remote (471 B)

[2022-08-07T14:29:21.597+0200 DEBUG ya_net::hybrid::service] Local bus handler (0xc63d5c103d64d00e06ce8b624643a590666cb923 -> 0x052877c2324abb1915d25acc3da69e11871f733d), address: /net/0x052877c2324abb1915d25acc3da69e11871f733d/exeunit/fcc16b22a4094e6abd869151b8beff00/GetExecBatchResults, id: 7325511871753383 -> send message to remote (318 B)

[2022-08-07T14:29:24.525+0200 INFO  actix_web::middleware::logger] 127.0.0.1 "POST /activity-api/v1/activity/5e38848c786e459a9ad8d3f8fc19ecf8/exec HTTP/1.1" 408 21 "-" "OpenAPI-Generator//python" 6.004931
[2022-08-07T14:29:24.531+0200 DEBUG ya_market::protocol::negotiation::common] Propagating TerminateAgreement: [P-d3b48813e1d2e54d6ca415c715c41737884f3710d0394795e8b862c2a59e2ceb]. Reason: Some(Reason { message: "Work cancelled", extra: Object({"golem.requestor.code": String("Cancelled")}) })

[2022-08-07T14:29:24.564+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73 (full: true, code: 0, id: 3938222459762543) 47 B

Requestor more logs

[2022-08-07T14:29:18.551+0200 DEBUG ya_net::hybrid::service] Local bus handler (0xc63d5c103d64d00e06ce8b624643a590666cb923 -> 0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73), address: /net/0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73/exeunit/5e38848c786e459a9ad8d3f8fc19ecf8/Exec, id: 3938222459762543 -> send message to remote (471 B)
[2022-08-07T14:29:18.552+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:8089:76f:b8ee:3bc2:d5c1:8f23:f9e:8816:27da:1: received duplicate ACK for seq 2056690627 (duplicate nr 1)
[2022-08-07T14:29:18.553+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:4687:528:77c2:324a:bb19:15d2:5acc:3da6:9e11:1: received duplicate ACK for seq 724369533 (duplicate nr 1)
[2022-08-07T14:29:18.557+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:43701:dd14:ced2:a2ab:f7f3:1b19:dc47:955f:2ea1:1: reset duplicate ACK count
[2022-08-07T14:29:18.557+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:43701:dd14:ced2:a2ab:f7f3:1b19:dc47:955f:2ea1:1: received duplicate ACK for seq 3289187352 (duplicate nr 1)
[2022-08-07T14:29:18.558+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x6702be6ff0cd0f988257178d34222820e56c8a35 (full: true, code: 0, id: 243117613612033) 63 B
[2022-08-07T14:29:18.558+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x41df232dfa1fa62f9e8484da99550d795c32a8d6 (full: true, code: 0, id: 4920959003658880) 63 B
[2022-08-07T14:29:18.558+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73 (full: true, code: 0, id: 2841584507523505) 63 B
[2022-08-07T14:29:18.558+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73 (full: true, code: 0, id: 7507240530561455) 63 B
[2022-08-07T14:29:18.559+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:8089:76f:b8ee:3bc2:d5c1:8f23:f9e:8816:27da:1: reset duplicate ACK count
[2022-08-07T14:29:18.559+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:8089:76f:b8ee:3bc2:d5c1:8f23:f9e:8816:27da:1: received duplicate ACK for seq 2056691242 (duplicate nr 1)
[2022-08-07T14:29:18.562+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:46701:7698:e9ef:4007:e2e1:f310:c19c:72a0:1dbf:1: reset duplicate ACK count
[2022-08-07T14:29:18.563+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:46701:7698:e9ef:4007:e2e1:f310:c19c:72a0:1dbf:1: received duplicate ACK for seq 2635842915 (duplicate nr 1)
[2022-08-07T14:29:18.563+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:21749:6702:be6f:f0cd:f98:8257:178d:3422:2820:1: reset duplicate ACK count
[2022-08-07T14:29:18.564+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:21749:6702:be6f:f0cd:f98:8257:178d:3422:2820:1: received duplicate ACK for seq 3886264169 (duplicate nr 1)
[2022-08-07T14:29:18.572+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34447:d9d9:e50e:ffdc:ba28:5343:ba2e:140b:c13e:1: reset duplicate ACK count
[2022-08-07T14:29:18.573+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34447:d9d9:e50e:ffdc:ba28:5343:ba2e:140b:c13e:1: received duplicate ACK for seq 1989123030 (duplicate nr 1)
[2022-08-07T14:29:18.575+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:11634:34e1:21c:5827:6f5d:21ed:4c4:73b9:a459:1: reset duplicate ACK count
[2022-08-07T14:29:18.579+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x41df232dfa1fa62f9e8484da99550d795c32a8d6 (full: true, code: 0, id: 4319667352412661) 63 B
[2022-08-07T14:29:18.579+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x41df232dfa1fa62f9e8484da99550d795c32a8d6 (full: true, code: 0, id: 2099662337745900) 63 B
[2022-08-07T14:29:18.580+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:4687:528:77c2:324a:bb19:15d2:5acc:3da6:9e11:1: reset duplicate ACK count
[2022-08-07T14:29:18.581+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:11634:34e1:21c:5827:6f5d:21ed:4c4:73b9:a459:1: received duplicate ACK for seq 3461237881 (duplicate nr 1)
[2022-08-07T14:29:18.582+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:4687:528:77c2:324a:bb19:15d2:5acc:3da6:9e11:1: received duplicate ACK for seq 724370149 (duplicate nr 1)
[2022-08-07T14:29:18.582+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:35761:608:c92b:81a4:d9e8:1f2c:ec7d:852a:4066:1: reset duplicate ACK count
[2022-08-07T14:29:18.583+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:35761:608:c92b:81a4:d9e8:1f2c:ec7d:852a:4066:1: received duplicate ACK for seq 3441561533 (duplicate nr 1)
[2022-08-07T14:29:18.583+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:8089:76f:b8ee:3bc2:d5c1:8f23:f9e:8816:27da:1: reset duplicate ACK count
[2022-08-07T14:29:18.584+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:8089:76f:b8ee:3bc2:d5c1:8f23:f9e:8816:27da:1: received duplicate ACK for seq 2056691856 (duplicate nr 1)
[2022-08-07T14:29:18.585+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:46701:7698:e9ef:4007:e2e1:f310:c19c:72a0:1dbf:1: reset duplicate ACK count
[2022-08-07T14:29:18.586+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34447:d9d9:e50e:ffdc:ba28:5343:ba2e:140b:c13e:1: reset duplicate ACK count
[2022-08-07T14:29:18.586+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34447:d9d9:e50e:ffdc:ba28:5343:ba2e:140b:c13e:1: received duplicate ACK for seq 1989123646 (duplicate nr 1)
[2022-08-07T14:29:18.589+0200 DEBUG actix_http::h1::dispatcher] cannot read request payload
[2022-08-07T14:29:18.595+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34832:25ec:5e43:859d:2d19:365:432a:1019:96bd:1: reset duplicate ACK count
[2022-08-07T14:29:18.596+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:21749:6702:be6f:f0cd:f98:8257:178d:3422:2820:1: reset duplicate ACK count
[2022-08-07T14:29:18.597+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x6702be6ff0cd0f988257178d34222820e56c8a35 (full: true, code: 0, id: 8007694462221922) 63 B
[2022-08-07T14:29:18.597+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xd9d9e50effdcba285343ba2e140bc13e71d8492d (full: true, code: 0, id: 3738074328547853) 63 B
[2022-08-07T14:29:18.597+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xd9d9e50effdcba285343ba2e140bc13e71d8492d (full: true, code: 0, id: 1935628143492243) 63 B
[2022-08-07T14:29:18.606+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x34e1021c58276f5d21ed04c473b9a459eecfe169 (full: true, code: 0, id: 4712585445914641) 63 B
[2022-08-07T14:29:18.606+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xf6e1480b1db373588778301da78bbd812179d170 (full: true, code: 0, id: 6902872488730986) 63 B
[2022-08-07T14:29:18.606+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xf6e1480b1db373588778301da78bbd812179d170 (full: true, code: 0, id: 5694575516961247) 63 B
[2022-08-07T14:29:18.606+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x79fda90eb37ab53304e13c6a263d51d3f517742a (full: true, code: 0, id: 4529994619048291) 63 B
[2022-08-07T14:29:18.614+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x0608c92b81a4d9e81f2cec7d852a406676208af3 (full: true, code: 0, id: 1660599750396632) 63 B
[2022-08-07T14:29:18.614+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x0608c92b81a4d9e81f2cec7d852a406676208af3 (full: true, code: 0, id: 5343330913295540) 63 B
[2022-08-07T14:29:18.615+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x703e8affd41cc0b237e1d0c567fd617b3117b6c6 (full: true, code: 0, id: 7091128462534486) 63 B
[2022-08-07T14:29:18.615+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x25ec5e43859d2d190365432a101996bdb8ca32f9 (full: true, code: 0, id: 7409960064147237) 63 B
[2022-08-07T14:29:18.618+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x34e1021c58276f5d21ed04c473b9a459eecfe169 (full: true, code: 0, id: 7339232485144659) 63 B
[2022-08-07T14:29:18.618+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x703e8affd41cc0b237e1d0c567fd617b3117b6c6 (full: true, code: 0, id: 8463698258389114) 63 B
[2022-08-07T14:29:18.618+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x703e8affd41cc0b237e1d0c567fd617b3117b6c6 (full: true, code: 0, id: 4844599646539813) 63 B
[2022-08-07T14:29:18.619+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x25ec5e43859d2d190365432a101996bdb8ca32f9 (full: true, code: 0, id: 2214130131510178) 63 B
[2022-08-07T14:29:18.619+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xd9d9e50effdcba285343ba2e140bc13e71d8492d (full: true, code: 0, id: 2035130933454096) 63 B
[2022-08-07T14:29:18.619+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xd9d9e50effdcba285343ba2e140bc13e71d8492d (full: true, code: 0, id: 2211619174068771) 63 B
[2022-08-07T14:29:18.619+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xf6e1480b1db373588778301da78bbd812179d170 (full: true, code: 0, id: 6472100418803255) 63 B
[2022-08-07T14:29:18.619+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xf6e1480b1db373588778301da78bbd812179d170 (full: true, code: 0, id: 303081219494899) 63 B
[2022-08-07T14:29:18.620+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x7698e9ef4007e2e1f310c19c72a01dbf5a4c4d73 (full: true, code: 0, id: 7875190230218489) 63 B
[2022-08-07T14:29:18.620+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x41df232dfa1fa62f9e8484da99550d795c32a8d6 (full: true, code: 0, id: 4459218865225318) 63 B
[2022-08-07T14:29:18.620+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x41df232dfa1fa62f9e8484da99550d795c32a8d6 (full: true, code: 0, id: 7556429319388454) 63 B
[2022-08-07T14:29:18.620+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x6702be6ff0cd0f988257178d34222820e56c8a35 (full: true, code: 0, id: 178122600704691) 63 B
[2022-08-07T14:29:18.620+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xdd14ced2a2abf7f31b19dc47955f2ea1ace03410 (full: true, code: 0, id: 398670212034091) 63 B
[2022-08-07T14:29:18.620+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x39bda034abe281c8da91aafa488bf6a618c9f97c (full: true, code: 0, id: 5746545490176437) 63 B
[2022-08-07T14:29:18.620+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x052877c2324abb1915d25acc3da69e11871f733d (full: true, code: 0, id: 5908337309515106) 340 B
[2022-08-07T14:29:18.620+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x052877c2324abb1915d25acc3da69e11871f733d (full: true, code: 0, id: 7978095033017549) 63 B
[2022-08-07T14:29:18.621+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x076fb8ee3bc2d5c18f230f9e881627da9f32263e (full: true, code: 0, id: 6902876164071271) 63 B
[2022-08-07T14:29:18.621+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xdd14ced2a2abf7f31b19dc47955f2ea1ace03410 (full: true, code: 0, id: 2513896895096711) 63 B
[2022-08-07T14:29:18.621+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x79fda90eb37ab53304e13c6a263d51d3f517742a (full: true, code: 0, id: 4087688843710858) 63 B
[2022-08-07T14:29:18.621+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x79fda90eb37ab53304e13c6a263d51d3f517742a (full: true, code: 0, id: 3232215812019447) 63 B
[2022-08-07T14:29:18.621+0200 DEBUG actix_http::h1::dispatcher] cannot read request payload
[2022-08-07T14:29:18.621+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x076fb8ee3bc2d5c18f230f9e881627da9f32263e (full: true, code: 0, id: 273187208211221) 63 B
[2022-08-07T14:29:18.621+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x052877c2324abb1915d25acc3da69e11871f733d (full: true, code: 0, id: 4134708044652694) 63 B
[2022-08-07T14:29:18.621+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x39bda034abe281c8da91aafa488bf6a618c9f97c (full: true, code: 0, id: 3388900850408645) 63 B
[2022-08-07T14:29:18.622+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x39bda034abe281c8da91aafa488bf6a618c9f97c (full: true, code: 0, id: 8347697870463342) 63 B
[2022-08-07T14:29:18.622+0200 INFO  actix_web::middleware::logger] 127.0.0.1 "GET /activity-api/v1/activity/fcc16b22a4094e6abd869151b8beff00/exec/d669b47047354feb9df33ade351bd002?timeout=5 HTTP/1.1" 200 420 "-" "OpenAPI-Generator//python" 5.775323
[2022-08-07T14:29:18.622+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xdd14ced2a2abf7f31b19dc47955f2ea1ace03410 (full: true, code: 0, id: 5269646202974384) 63 B
[2022-08-07T14:29:18.622+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0xdd14ced2a2abf7f31b19dc47955f2ea1ace03410 (full: true, code: 0, id: 3418244818785300) 63 B
[2022-08-07T14:29:18.622+0200 DEBUG ya_net::hybrid::service] Handle reply from node 0x0608c92b81a4d9e81f2cec7d852a406676208af3 (full: true, code: 0, id: 487497784287379) 63 B
[2022-08-07T14:29:18.623+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34832:25ec:5e43:859d:2d19:365:432a:1019:96bd:1: received duplicate ACK for seq 2126112319 (duplicate nr 1)
[2022-08-07T14:29:18.624+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34832:25ec:5e43:859d:2d19:365:432a:1019:96bd:1: reset duplicate ACK count
[2022-08-07T14:29:18.624+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34832:25ec:5e43:859d:2d19:365:432a:1019:96bd:1: received duplicate ACK for seq 2126112935 (duplicate nr 1)
[2022-08-07T14:29:18.626+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:35761:608:c92b:81a4:d9e8:1f2c:ec7d:852a:4066:1: reset duplicate ACK count
[2022-08-07T14:29:18.626+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:35761:608:c92b:81a4:d9e8:1f2c:ec7d:852a:4066:1: received duplicate ACK for seq 3441562149 (duplicate nr 1)
[2022-08-07T14:29:18.629+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:43701:dd14:ced2:a2ab:f7f3:1b19:dc47:955f:2ea1:1: reset duplicate ACK count
[2022-08-07T14:29:18.631+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:11634:34e1:21c:5827:6f5d:21ed:4c4:73b9:a459:1: reset duplicate ACK count
[2022-08-07T14:29:18.633+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:43701:dd14:ced2:a2ab:f7f3:1b19:dc47:955f:2ea1:1: received duplicate ACK for seq 3289187968 (duplicate nr 1)
[2022-08-07T14:29:18.634+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34447:d9d9:e50e:ffdc:ba28:5343:ba2e:140b:c13e:1: reset duplicate ACK count
[2022-08-07T14:29:18.635+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:4687:528:77c2:324a:bb19:15d2:5acc:3da6:9e11:1: reset duplicate ACK count
[2022-08-07T14:29:18.635+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34447:d9d9:e50e:ffdc:ba28:5343:ba2e:140b:c13e:1: received duplicate ACK for seq 1989124261 (duplicate nr 1)
[2022-08-07T14:29:18.635+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:8089:76f:b8ee:3bc2:d5c1:8f23:f9e:8816:27da:1: reset duplicate ACK count
[2022-08-07T14:29:18.636+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:8089:76f:b8ee:3bc2:d5c1:8f23:f9e:8816:27da:1: received duplicate ACK for seq 2056692473 (duplicate nr 1)
[2022-08-07T14:29:18.638+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:43701:dd14:ced2:a2ab:f7f3:1b19:dc47:955f:2ea1:1: reset duplicate ACK count
[2022-08-07T14:29:18.640+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:35761:608:c92b:81a4:d9e8:1f2c:ec7d:852a:4066:1: reset duplicate ACK count
[2022-08-07T14:29:18.640+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:4687:528:77c2:324a:bb19:15d2:5acc:3da6:9e11:1: received duplicate ACK for seq 724370765 (duplicate nr 1)
[2022-08-07T14:29:18.641+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:35761:608:c92b:81a4:d9e8:1f2c:ec7d:852a:4066:1: received duplicate ACK for seq 3441562764 (duplicate nr 1)
[2022-08-07T14:29:18.643+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34832:25ec:5e43:859d:2d19:365:432a:1019:96bd:1: reset duplicate ACK count
[2022-08-07T14:29:18.643+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34832:25ec:5e43:859d:2d19:365:432a:1019:96bd:1: received duplicate ACK for seq 2126113550 (duplicate nr 1)
[2022-08-07T14:29:18.645+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:43701:dd14:ced2:a2ab:f7f3:1b19:dc47:955f:2ea1:1: received duplicate ACK for seq 3289188584 (duplicate nr 1)
[2022-08-07T14:29:18.647+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34447:d9d9:e50e:ffdc:ba28:5343:ba2e:140b:c13e:1: reset duplicate ACK count
[2022-08-07T14:29:18.648+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:34447:d9d9:e50e:ffdc:ba28:5343:ba2e:140b:c13e:1: received duplicate ACK for seq 1989124880 (duplicate nr 1)
[2022-08-07T14:29:18.650+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:4687:528:77c2:324a:bb19:15d2:5acc:3da6:9e11:1: reset duplicate ACK count
[2022-08-07T14:29:18.650+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:4687:528:77c2:324a:bb19:15d2:5acc:3da6:9e11:1: received duplicate ACK for seq 724371381 (duplicate nr 1)
[2022-08-07T14:29:18.653+0200 DEBUG smoltcp::socket::tcp] tcp:c63d:5c10:3d64:d00e:6ce:8b62:4643:a590:8089:76f:b8ee:3bc2:d5c1:8f23:f9e:8816:27da:1: reset duplicate ACK count
[2022-08-07T14:29:18.654+0200 DEBUG smoltcp::socket::tcp] sending sACK option with current assembler ranges
[2022-08-07T14:29:18.655+0200 DEBUG smoltcp::socket::tcp] sending sACK option with current assembler ranges... (27 KB left)
wkargul commented 2 years ago

Task https://github.com/golemfactory/yagna/issues/2053 is about the same issue, therefore I closed it and this should be investigated&implemented here instead.

etam commented 1 year ago

Here's my story about timeouts:

First I noticed on grafana, that one of testnet checkers was quite regularly disappearing: Screenshot_20230113_131659 But the yagna process was running apparently fine this whole time.

Then I tried to use curl to get metrics from yagna API at such "disappearing" time, and indeed curl was waiting a long time and got timeout.

I found that a python script, that uses yapapi to talk to this yagna is getting quite a lot ProposalFailed(exception=TimeoutError():

DEBUG yapapi.events] ProposalFailed(exception=TimeoutError(), job=Job(id=1), proposal=<yapapi.rest.market.OfferProposal object at 0x7f4481a70780>)
Traceback (most recent call last):
  File "/home/golem-dev0/releases/yapapi-0.10.0/lib/python3.6/site-packages/yapapi/engine.py", line 926, in handler
    event = await self._handle_proposal(proposal_)
  File "/home/golem-dev0/releases/yapapi-0.10.0/lib/python3.6/site-packages/yapapi/engine.py", line 892, in _handle_proposal
    await proposal.respond(demand_builder.properties, demand_builder.constraints)
  File "/home/golem-dev0/releases/yapapi-0.10.0/lib/python3.6/site-packages/yapapi/rest/market.py", line 166, in respond
    self._subscription.id, self.id, proposal, _request_timeout=5
  File "/home/golem-dev0/releases/yapapi-0.10.0/lib/python3.6/site-packages/ya_market/api_client.py", line 201, in __call_api
    _request_timeout=_request_timeout,
  File "/home/golem-dev0/releases/yapapi-0.10.0/lib/python3.6/site-packages/ya_market/rest.py", line 276, in POST
    body=body,
  File "/home/golem-dev0/releases/yapapi-0.10.0/lib/python3.6/site-packages/ya_market/rest.py", line 170, in request
    r = await self.pool_manager.request(**args)
  File "/home/golem-dev0/releases/yapapi-0.10.0/lib/python3.6/site-packages/aiohttp/client.py", line 560, in _request
    await resp.start(conn)
  File "/home/golem-dev0/releases/yapapi-0.10.0/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 914, in start
    self._continue = None
  File "/home/golem-dev0/releases/yapapi-0.10.0/lib/python3.6/site-packages/aiohttp/helpers.py", line 720, in __exit__
    raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError

During this time, using lsof -i -n -P | grep $yagna_api_port | wc -l, I could see that the number of connections was growing. (maybe it causes #1935 ?)

After 5-10 minutes this stops and everything gets back to normal. And later it repeats.

Example logs: logs.zip