Kive's purge job got behind in our test environment, and then it ran a large purge job while MiCall was rerunning a bunch of old samples. That meant that the chances of purging a sample while the MiCall watcher was trying to download it was much higher.
I saw an error like this:
2019-03-20 13:25:55[WARNING]kiveapi._validate_response(): Error response 500 for https://testkive-int.cfenet.ubc.ca/api/datasets/789409/download/: <h1>Server Error (500)</h1>
2019-03-20 13:25:55[ERROR]micall.monitor.kive_watcher.wait_for_retry(): Waiting 0:00:20 before retrying.
Traceback (most recent call last):
File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 92, in poll_runs
is_finished = self.poll_sample_runs(sample_watcher)
File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 147, in poll_sample_runs
for run in (main_run, midi_run)
File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 148, in <listcomp>
if run and run['id'] in self.active_runs and not self.fetch_run_status(run)]
File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 187, in fetch_run_status
sample_watcher)
File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 673, in fetch_run_status
lambda: self.download_file(dataset_url + 'download/',
File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 639, in kive_retry
return target()
File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 674, in <lambda>
scratch_path / filename))
File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 682, in download_file
self.session.download_file(f, dataset_url)
File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/kiveapi/kiveapi.py", line 214, in download_file
for block in self.download(*args, **kwargs).iter_content(1024):
File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/kiveapi/kiveapi.py", line 206, in download
return self.get(*args, **kwargs)
File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/kiveapi/kiveapi.py", line 198, in get
is_json=is_json)
File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/kiveapi/kiveapi.py", line 160, in _validate_response
self.server_url))
kiveapi.errors.KiveServerException: Server error 500 on https://testkive-int.cfenet.ubc.ca.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 422, in poll_runs
folder_watcher.poll_runs()
File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 96, in poll_runs
f'failed.') from ex
RuntimeError: Polling sample group PNL4-3-REH479-V3-2 failed.
The MiCall watcher checks for purged outputs before deciding to reuse an old run, so it must have been purged between the time it was found and the time the download was attempted.
The workaround is to restart the MiCall watcher so it reruns the purged sample.
The fix would be to detect the problem and automatically rerun the purged sample. There was a related problem where a run's input dataset had been purged.
[ ] Check for purged output before download, or detect failure. Rerun sample.
[ ] Check for purged input before requesting run. Rerun source run.
Is there any benefit to using the rerun feature instead of creating a new run?
Kive's purge job got behind in our test environment, and then it ran a large purge job while MiCall was rerunning a bunch of old samples. That meant that the chances of purging a sample while the MiCall watcher was trying to download it was much higher.
I saw an error like this:
The MiCall watcher checks for purged outputs before deciding to reuse an old run, so it must have been purged between the time it was found and the time the download was attempted.
The workaround is to restart the MiCall watcher so it reruns the purged sample.
The fix would be to detect the problem and automatically rerun the purged sample. There was a related problem where a run's input dataset had been purged.
Is there any benefit to using the rerun feature instead of creating a new run?