golemfactory / clay

Golem is creating a global market for computing power.
https://golem.network
GNU General Public License v3.0
2.91k stars 284 forks source link

Problem downloading task resources on many nodes in one longer task #4791

Open ederenn opened 5 years ago

ederenn commented 5 years ago

Description

Golem Version: 0.21.0

Golem-Messages version (leave empty if unsure):

Electron version (if used): 0.2.3

OS [e.g. Windows 10 Pro]: Linux, Mac

Branch (if launched from source): -

Mainnet/Testnet: mainnet

Priority label is set to the lowest by default. To setup higher priority please change the label P0 label is set for Severity-Critical/Effort-easy P1 label is set for Severity-Critical/Effort-hard P2 label is set for Severity-Low/ Effort-easy P3 label is set for Severity-Low/Effort-hard

Description of the issue:

Many nodes have problems with downloading task resources. As a result there is a problem with finishing longer tasks - 40 subtasks and more.

INFO     [golem.network.p2p.peersession      ] Starting peer session. address=10.30.11.251:45856
ERROR    [golem.resource.client              ] Error executing async, raising. count=1, method=<bound method HyperdriveAsyncClient.get_async of <HyperdriveAsyncClient hyperg at http://localhost:3292>>, args=(), kwargs={'content_hash': 'f28198b3258a3b4ec842cd2a001ff117', 'filename': '/home/ederenn/.local/share/golem/default/mainnet/ComputerRes/6e783968-eb59-11e9-97ac-5ad86f3605c2/tmp', 'filepath': '/home/ederenn/.local/share/golem/default/mainnet/ComputerRes/6e783968-eb59-11e9-97ac-5ad86f3605c2/tmp', 'client_options': <golem.network.hyperdrive.client.HyperdriveClientOptions object at 0x7fc2a0744cf8>}, exc=TimeoutError('',)
WARNING  [golem.resource.hyperdrive.resourcesmanager] Error downloading resource.path=/home/ederenn/.local/share/golem/default/mainnet/ComputerRes/6e783968-eb59-11e9-97ac-5ad86f3605c2/tmp/., hash=f28198b3258a3b4ec842cd2a001ff117, error=[Failure instance: Traceback: <class 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:568:_startRunCallbacks
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:654:_runCallbacks
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:1475:gotResult
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:1464:_inlineCallbacks
--- <exception caught here> ---
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:1416:_inlineCallbacks
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/python/failure.py:512:throwExceptionIntoGenerator
/home/buildbot-worker/worker/buildpackage_linux/build/golem/network/hyperdrive/client.py:261:_async_request
]
WARNING  [golem.task.server.helpers          ] Task result error: b29774c0-eb59-11e9-9765-5ad86f3605c2 ([Failure instance: Traceback: <class 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:568:_startRunCallbacks
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:654:_runCallbacks
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:1475:gotResult
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:1464:_inlineCallbacks
--- <exception caught here> ---
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/internet/defer.py:1416:_inlineCallbacks
/home/buildbot-worker/worker/buildpackage_linux/build/.venv/lib/python3.6/site-packages/twisted/python/failure.py:512:throwExceptionIntoGenerator
/home/buildbot-worker/worker/buildpackage_linux/build/golem/network/hyperdrive/client.py:261:_async_request
])

Actual result:

A lot of subtasks are failing, task might end in timeout.

Steps To Reproduce

Short description of steps to reproduce the behavior: e.g.

  1. Launch Golem
  2. Create a task DiscBoyBlender resolution and samples from file, 30 frames , 60 subtasks, 2.5h for task, 20 min for subtask, bid 0.1.
  3. Check logs for error

Logs and any additional context

Requestor: hyperg_2019-10-04_15-24-35.log golem (2).log golem.2019-10-09.log

Providers: https://drive.google.com/open?id=1-dLOXErY81SFKUfmIrmyV7eC2PLINw1k https://drive.google.com/open?id=1gxiPr7vcA-u56kYMi2akvobr7XIrGYEI

Proposed Solution?

(Optional: What could be a solution for that issue)

shadeofblue commented 4 years ago

@prekucki any idea if this could be improved in any way? do we need to increase the timeouts?