Closed MKV21 closed 9 years ago
The timer does not help, a http download measurement might still get stuck. There has to be a bug between taskexecuture.cpp#33 and taskexecuture.cpp#92/102/121 which leaves the following code:
It can not be in the result-method of the HttpDownload.
How exactly did you reproduce this bug/behaviour?
Unfortunately we can not reproduce this bug at will. It seems to happen randomly to almost all probes after they run for 2-48 hours (executing http download 12 times per hour). My own probe and Rolfs probe at our homes seem to crash very fast, so my best guess is that it has something to do with the download speed (we both have 100 Mbit Internet) if the probe is installed on a low-powered machine (rack41, which is a server with 200 Mbit Internet, does not crash so fast, but if I remember correctly it also crashed at some point).
I have added a new log-message to see if this happens within the HTTP download or in the TaskExecutor (but I have to wait until I am home to check the log).
I guess we have two bugs there:
1 should be fixed (but not committed yet).
Has #310 resolved this issue?
Resolved by #310 (hopefully).
aaaaaaaaaaand it happend again. Two probes crashed somewhere in HTTPDownload::calculateResults() after num_threads++.
"21:17:29" "INFO" "void InternalTaskExecutor::execute(const ScheduleDefinition&, MeasurementObserver*)" : "Starting execution of httpdownload" QObject::moveToThread: Cannot move objects with a parent QObject::moveToThread: Cannot move objects with a parent QObject::moveToThread: Cannot move objects with a parent QObject::moveToThread: Cannot move objects with a parent "21:17:29" "INFO" "bool HTTPDownload::startThreads(const QHostInfo&)" : "Started 4 threads"
It is the same as with #233, the measurement starts and never comes back, this blocks the whole application forever.
I have reviewed the code and fixed some minor issues, but that did not fix this. There might be a bug in the counters (connectedThreads, finishedThread, downloadingThread, ...), if they are not counted correctly (in an error case) the measurement might come to hold waiting for a thread which already finished. Even though it is complex I could not find anything wrong with it.
I am going to apply a temporary fix today: a timer which stops the measurement after one minute. But if the measurement blocks somewhere and does not return to the event loop this will not fix the problem.