EarthScope / rover

ROVER: robust data access tool for FDSN data centers
https://earthscope.github.io/rover/
Other
10 stars 1 forks source link

Critcal error logs truncate #82

Closed timronan closed 5 years ago

timronan commented 5 years ago

When working on issue #63 we found that ROVER's critical error logs truncate. We must get complete stack traces from critical errors so we can more readily find and fix bugs.

Example: Logged Stack Trace.

CRITICAL 2019-03-08 07:53:58,535: Traceback (most recent call last):
  File "/opt/devops/py2/lib/python2.7/site-packages/rover/__init__.py", line 106, in main
    execute(config.command, config)
  File "/opt/devops/py2/lib/python2.7/site-packages/rover/__init__.py", line 85, in execute
    commands[command][0](config).run(config.args)
  File "/opt/devops/py2/lib/python2.7/site-packages/rover/retrieve.py", line 213, in run
    return self.do_run(args, True, RETRIEVE)
  File "/opt/devops/py2/lib/python2.7/site-packages/rover/retrieve.py", line 152, in do_run
    return self._fetch()
  File "/opt/devops/py2/lib/python2.7/site-packages/rover/retrieve.py", line 180, in _fetch
    n_downloads = self._download_manager.download()
  File "/opt/devops/py2/lib/python2.7/site-packages/rover/manager.py", line 883, in download
    self.step(quiet=False)
  File "/opt/devops/py2/lib/python2.7/site-packages/rover/manager.py", line 853, in step
    self._clean_sources(quiet=quiet)
  File "/opt/devops/py2/lib/python2.7/site-packages/rover/manager.py", line 810, in _clean_sources
    raise e
ZeroDivisionError: float division by zero

Actual Stack Trace:

CRITICAL 2019-03-12 15:03:16,303: Traceback (most recent call last):
   File "/Users/tronan/Desktop/Projects/rover/rover/__init__.py", line 97, in main
     execute(config.command, config)
   File "/Users/tronan/Desktop/Projects/rover/rover/__init__.py", line 76, in execute
     commands[command][0](config).run(config.args)
   File "/Users/tronan/Desktop/Projects/rover/rover/retrieve.py", line 206, in run
     return self.do_run(args, True, RETRIEVE)
   File "/Users/tronan/Desktop/Projects/rover/rover/retrieve.py", line 145, in do_run
     return self._fetch()
   File "/Users/tronan/Desktop/Projects/rover/rover/retrieve.py", line 173, in _fetch
     n_downloads = self._download_manager.download()
   File "/Users/tronan/Desktop/Projects/rover/rover/manager.py", line 877, in download
     self.step(quiet=False)
   File "/Users/tronan/Desktop/Projects/rover/rover/manager.py", line 847, in step
     self._clean_sources(quiet=quiet)
   File "/Users/tronan/Desktop/Projects/rover/rover/manager.py", line 804, in _clean_sources
     raise e
   File "/Users/tronan/Desktop/Projects/rover/rover/manager.py", line 799, in _clean_sources
     complete = self._source(name).is_complete()
   File "/Users/tronan/Desktop/Projects/rover/rover/manager.py", line 421, in is_complete
     complete = self._is_complete_initial_reads(retry_possible)
   File "/Users/tronan/Desktop/Projects/rover/rover/manager.py", line 467, in _is_complete_initial_reads
     self._new_retrieval(True)
   File "/Users/tronan/Desktop/Projects/rover/rover/manager.py", line 599, in _new_retrieval
     required = remote.subtract(local)
   File "/Users/tronan/Desktop/Projects/rover/rover/coverage.py", line 111, in subtract
     other.join()
   File "/Users/tronan/Desktop/Projects/rover/rover/coverage.py", line 55, in join
     joined, (tolerance, increment) = [], self.tolerances()
   File "/Users/tronan/Desktop/Projects/rover/rover/coverage.py", line 93, in tolerances
     return self._frac_tolerance / self.samplerate, self._frac_increment / self.samplerate
 ZeroDivisionError: float division by zero
chad-earthscope commented 5 years ago

This traceback truncation only appears to occur with Python 2. At least this was the case with Python 2.7.15 versus 3.6.7 running on Linux (Ubuntu 18.04). It is repeatable using the test case in #63 if using a version before the commit that fixed that issue.

chad-earthscope commented 5 years ago

Resolution: this truncation was caused because the Exception occurred in a nested try-catch block, so only the "outer" stack trace was show. This was because the exception was re-raised with "raise e", which only re-raises with the most "local" stack trace. Solution is to simply re-raise with "raise" so all details, even nested exception handling, flow to the place the exception is caught.