sapphire RuntimeError causes fatal crash of updatehistograms job

tomkooij commented 4 years ago

On Oct 30 (updatehistograms of oct 29 data) there was some error in sapphire while reconstructing events. (Due to s9 uploading an invalid config, it was fixed automatically when s9 uploaded a new config)

However this crashes/stops the entire updatehistograms job. Should we wrap these calls in try: except: to prevent this? (Or solve this somewhere else??)

The IndexError (list index out of range) was at: https://github.com/HiSPARC/sapphire/blob/master/sapphire/analysis/core_reconstruction.py#L69 Because s9 uploaded a config without slave data while the station has 4 detectors this errored.

Log:


DEBUG:publicdb.histograms.jobs:Determining detector timing offsets for Summary: 9 - 29 Oct 2019
DEBUG:publicdb.histograms.jobs:Saving detector timing offsets for Summary: 9 - 29 Oct 2019
DEBUG:publicdb.histograms.jobs:Saved succesfully
ERROR:sentry.errors.serializer:the file object is closed
Traceback (most recent call last):
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/raven/utils/serializer/manager.py", line 76, in transform
    return repr(value)
  File "tables/tableextension.pyx", line 1634, in tables.tableextension.Row.__repr__
  File "tables/tableextension.pyx", line 1626, in tables.tableextension.Row.__str__
  File "tables/tableextension.pyx", line 746, in tables.tableextension.Row.table.__get__
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/tables/file.py", line 2159, in _check_open
    raise ClosedFileError("the file object is closed")
ClosedFileError: the file object is closed
ERROR:sentry.errors.serializer:the file object is closed
Traceback (most recent call last):
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/raven/utils/serializer/manager.py", line 76, in transform
    return repr(value)
  File "tables/tableextension.pyx", line 1634, in tables.tableextension.Row.__repr__
  File "tables/tableextension.pyx", line 1626, in tables.tableextension.Row.__str__
  File "tables/tableextension.pyx", line 746, in tables.tableextension.Row.table.__get__
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/tables/file.py", line 2159, in _check_open
    raise ClosedFileError("the file object is closed")
ClosedFileError: the file object is closed
Traceback (most recent call last):
  File "/srv/publicdb/www/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/django/core/management/__init__.py", line 364, in execute_
from_command_line
    utility.execute()
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/django/core/management/__init__.py", line 356, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/django/core/management/base.py", line 283, in run_from_arg
v
    self.execute(*args, **cmd_options)
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/django/core/management/base.py", line 330, in execute
    output = self.handle(*args, **options)
  File "/srv/publicdb/www/publicdb/histograms/management/commands/updatehistograms.py", line 23, in handle
    completed = update_all_histograms()
  File "/srv/publicdb/www/publicdb/histograms/jobs.py", line 61, in update_all_histograms
    perform_update_tasks()
  File "/srv/publicdb/www/publicdb/histograms/jobs.py", line 84, in perform_update_tasks
    update_histograms()
  File "/srv/publicdb/www/publicdb/histograms/jobs.py", line 201, in update_histograms
    perform_tasks_manager(Summary, "needs_update_events", perform_events_tasks)
  File "/srv/publicdb/www/publicdb/histograms/jobs.py", line 247, in perform_tasks_manager
    summary, tmp_locations = perform_certain_tasks(summary)
  File "/srv/publicdb/www/publicdb/histograms/jobs.py", line 265, in perform_events_tasks
    tmp_locations.append(esd.reconstruct_events_and_store_temporary_esd(summary))
  File "/srv/publicdb/www/publicdb/histograms/esd.py", line 174, in reconstruct_events_and_store_temporary_esd
    reconstruct.reconstruct_and_store()
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/sapphire/analysis/reconstructions.py", line 116, in recons
truct_and_store
    self.reconstruct_cores(detector_ids=detector_ids)
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/sapphire/analysis/reconstructions.py", line 147, in recons
truct_cores
    self.progress, initials)
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/sapphire/analysis/core_reconstruction.py", line 100, in re
construct_events
    for event, initial in events_init]
  File "/srv/publicdb/publicdb_venv/lib/python2.7/site-packages/sapphire/analysis/core_reconstruction.py", line 69, in rec
onstruct_event
    dx, dy, dz = self.station.detectors[id].get_coordinates()
IndexError: list index out of range
Sentry is attempting to send 1 pending error messages
Waiting up to 10 seconds
Press Ctrl-C to quit

(ignore the sentry errors, it's the IndexError that caused this) also at sentry.io

davidfokkema commented 4 years ago

It's always better to catch errors and log them, but since the Raspberry Pi at s9 will be replaced with a W10 PC it is very unlikely that we will encounter this problem again. Also, this problem hasn't occurred during the past few years, I think?

tomkooij commented 4 years ago

@davidfokkema : This problem (at other stations) has occured 4 times the past few weeks according to sentry.io.

However, we have encountered it quite frequently when adding new stations.

But I'll just fix the sapphire side and leave the publicdb jobs alone as long as these errors do not occur frequently. (If it ain't broken don't fix it)

tomkooij commented 4 years ago

I thought I opened this over at hisparc/publicdb ... oops. This is an publicdb.histogramjobs issue.

I'll create a new issue #192 to describe the problem in sapphire.

HiSPARC / sapphire

sapphire RuntimeError causes fatal crash of updatehistograms job #191