Closed ederenn closed 5 years ago
This bug is caused because preview file is handled in two separate threads.
/golem/task/taskserver.py", line 140, in sync_network /golem/task/taskserver.py", line 881, in __remove_old_tasks /golem/task/taskmanager.py", line 787, in check_timeouts /golem/core/common.py", line 147, in func_wrapper /apps/rendering/task/framerenderingtask.py", line 131, in computation_failed /apps/rendering/task/framerenderingtask.py", line 338, in _update_frame_task_preview
and
/golem/network/transport/tcpnetwork.py", line 366, in _interpret
/golem/task/tasksession.py", line 123, in interpret
/golem/network/transport/session.py", line 76, in interpret
/golem/task/tasksession.py", line 485, in _react_to_want_to_compute_task
/golem/task/taskmanager.py", line 396, in get_next_subtask
/apps/blender/task/blenderrendertask.py", line 484, in query_extra_data
/apps/rendering/task/framerenderingtask.py", line 338, in _update_frame_task_preview
Preview file can be corupted this way, and PIL yields that it can not identify image file.
In general removing tasks in Golem is very poorl handled because of python
easier to ask for forgiveness than permission
guidline, instead of ensuring that it is safe to delete task/subtask we just delete it and then handle lot of errors that it causes everywhere in datastructures of Golem.
We can go that way further and just provide proper locking of this file, or make it right this time.
hmm... maybe the more interesting question is: why does __remove_old_tasks
touch the same task that another node just issued a WantToComputeTask
for (which suggests the task is still being computed)...
hmm... maybe the more interesting question is: why does
__remove_old_tasks
touch the same task that another node just issued aWantToComputeTask
for (which suggests the task is still being computed)...
Because it's marking a subtask as timed out. Yeah, it's not something you wold expect in __remove_old_tasks
.
So, every LoopingCallService
runs in a separate thread. Can we move to Rust
quicker? There a data race is a compilation error.
We're fixing a bigger issue here: #3660
Encountered on:
Mac OS GOLEM Version: 0.17.0+dev99.g6d235f1
during computations of a task: helicopter.blend 400x400 10 frames 30 subtasks, 4 subtask had been left uncomputed
in
golem.log
from res folder: