cfe-lab / MiCall

Pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C
https://cfe-lab.github.io/MiCall
GNU Affero General Public License v3.0
14 stars 9 forks source link

Rerun a sample if it gets purged #464

Open donkirkby opened 5 years ago

donkirkby commented 5 years ago

Kive's purge job got behind in our test environment, and then it ran a large purge job while MiCall was rerunning a bunch of old samples. That meant that the chances of purging a sample while the MiCall watcher was trying to download it was much higher.

I saw an error like this:

2019-03-20 13:25:55[WARNING]kiveapi._validate_response(): Error response 500 for https://testkive-int.cfenet.ubc.ca/api/datasets/789409/download/: <h1>Server Error (500)</h1>
2019-03-20 13:25:55[ERROR]micall.monitor.kive_watcher.wait_for_retry(): Waiting 0:00:20 before retrying.
Traceback (most recent call last):
  File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 92, in poll_runs
    is_finished = self.poll_sample_runs(sample_watcher)
  File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 147, in poll_sample_runs
    for run in (main_run, midi_run)
  File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 148, in <listcomp>
    if run and run['id'] in self.active_runs and not self.fetch_run_status(run)]
  File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 187, in fetch_run_status
    sample_watcher)
  File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 673, in fetch_run_status
    lambda: self.download_file(dataset_url + 'download/',
  File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 639, in kive_retry
    return target()
  File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 674, in <lambda>
    scratch_path / filename))
  File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 682, in download_file
    self.session.download_file(f, dataset_url)
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/kiveapi/kiveapi.py", line 214, in download_file
    for block in self.download(*args, **kwargs).iter_content(1024):
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/kiveapi/kiveapi.py", line 206, in download
    return self.get(*args, **kwargs)
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/kiveapi/kiveapi.py", line 198, in get
    is_json=is_json)
  File "/mnt/data/don/git/MiCall/venv_micall/lib/python3.6/site-packages/kiveapi/kiveapi.py", line 160, in _validate_response
    self.server_url))
kiveapi.errors.KiveServerException: Server error 500 on https://testkive-int.cfenet.ubc.ca.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/data/don/git/MiCall/micall/monitor/kive_watcher.py", line 422, in poll_runs
    folder_watcher.poll_runs()
  File "/mnt/data/don/git/MiCall/micall/monitor/sample_watcher.py", line 96, in poll_runs
    f'failed.') from ex
RuntimeError: Polling sample group PNL4-3-REH479-V3-2 failed.

The MiCall watcher checks for purged outputs before deciding to reuse an old run, so it must have been purged between the time it was found and the time the download was attempted.

The workaround is to restart the MiCall watcher so it reruns the purged sample.

The fix would be to detect the problem and automatically rerun the purged sample. There was a related problem where a run's input dataset had been purged.

Is there any benefit to using the rerun feature instead of creating a new run?

Donaim commented 1 year ago

Related to #921 ?