EarthScope / rover

ROVER: robust data access tool for FDSN data centers
https://earthscope.github.io/rover/
Other
10 stars 1 forks source link

Ungraceful quit when pressing Control-C during a retrieve #46

Closed chad-earthscope closed 6 years ago

chad-earthscope commented 6 years ago

After a service outage at the DMC, a long running rover retrieve was spinning out errors, such as:

...
retrieve    ERROR: Download failed (return code 1)
retrieve  DEFAULT: Downloading TA_G30A 2010-246 (N_S 246/636; day 246/366)
download.13479    ERROR: ('Connection aborted.', BadStatusLine("''",))
download.13479    ERROR: Will log response contents (max 10 lines) here:
download.13479 CRITICAL: [Errno 2] No such file or directory: '/opt/devops/datarepo/tmp/rover_download_d7d7e3_13479'
download.13479  DEFAULT: See "rover help download"
...

After pressing Control-C on the console, nothing apparently happened, if there was a message on the screen it went by quickly.

Pressing Control-C many more times resulting in a numerous Trackbacks, ala:

Traceback (most recent call last): 
  File "/opt/devops/env2/lib/python2.7/site-packages/rover/__init__.py", line 72, in main
  File "/opt/devops/env2/bin/rover", line 11, in <module>
    sys.exit(main())
  File "/opt/devops/env2/lib/python2.7/site-packages/rover/__init__.py", line 72, in main

Finally, after some a minute or so, rover quit.

The TA_2012.req contains:

TA * * * 2010-01-01 2011-01-01

Rover needs to be interrupt-able and restartable as it is expected to run for long periods of time.

chad-earthscope commented 6 years ago

The "inability to restart" aspect of this issue has been moved to #47.

chad-earthscope commented 6 years ago

The original reported behavior was on CentOS (Linux). I tried to duplicate this on macOS and a single Control-C (SIGINT) quickly shut down rover while it was doing multiple retrieves. Trying again in Linux, it quickly shut down but once left rover in a non-restartable state due to empty files, leading to #48.

Thus far I have not been able to reproduce the long, drawn out wait for a shutdown, perhaps that was an anomaly.

Regardless, the shutdown is never graceful as there is always at least one Traceback, in some cases many of them, dumped to the console. Users should not be shown a Traceback when shutting down a rover retrieve in the most common way one would shutdown a program run on the command line.

andrewcooke-isti commented 6 years ago

Fixed in 3d113c8

chad-earthscope commented 6 years ago

More ungraceful stack traces, but no delays:

...
retrieve  DEFAULT: Downloading TA_113A 2010-008 (N_S 1/420; day 8/10)
retrieve  DEFAULT: Downloading TA_113A 2010-009 (N_S 1/420; day 9/10)
retrieve  DEFAULT: Downloading TA_113A 2010-010 (N_S 1/420; day 10/10)
^CTraceback (most recent call last):
  File "/---/bin/rover", line 7, in <module>
    from rover import main
  File "/---/lib/python3.6/site-packages/rover/__init__.py", line 15, in <module>
    from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
  File "/---/lib/python3.6/site-packages/rover/args.py", line 20, in <module>
    from .utils import create_parents, canonify, check_cmd
  File "/---/lib/python3.6/site-packages/rover/utils.py", line 28, in <module>
    from requests import Session
  File "/---/lib/python3.6/site-packages/requests/__init__.py", line 84, in <module>
    from urllib3.contrib import pyopenssl
  File "/---/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 48, in <module>
    from cryptography.hazmat.backends.openssl import backend as openssl_backend
  File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/__init__.py", line 7, in <module>
    from cryptography.hazmat.backends.openssl.backend import backend
  File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/backend.py", line 59, in <module>
    from cryptography.hazmat.primitives.ciphers.algorithms import (
  File "/---/lib/python3.6/site-packages/cryptography/hazmat/primitives/ciphers/algorithms.py", line 25, in <module>
    class AES(object):
KeyboardInterrupt
retrieve    ERROR: Download failed (return code 1)
...
retrieve  DEFAULT: Downloading TA_121A 2010-001 (N_S 2/420; day 1/10)
retrieve  DEFAULT: Downloading TA_121A 2010-002 (N_S 2/420; day 2/10)
retrieve  DEFAULT: Downloading TA_121A 2010-003 (N_S 2/420; day 3/10)
retrieve  DEFAULT: Downloading TA_121A 2010-004 (N_S 2/420; day 4/10)
^CTraceback (most recent call last):
  File "/---/bin/rover", line 7, in <module>
    from rover import main
  File "/---/lib/python3.6/site-packages/rover/__init__.py", line 15, in <module>
    from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
  File "/---/lib/python3.6/site-packages/rover/args.py", line 20, in <module>
    from .utils import create_parents, canonify, check_cmd
  File "/---/lib/python3.6/site-packages/rover/utils.py", line 28, in <module>
    from requests import Session
  File "/---/lib/python3.6/site-packages/requests/__init__.py", line 84, in <module>
    from urllib3.contrib import pyopenssl
  File "/---/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 48, in <module>
    from cryptography.hazmat.backends.openssl import backend as openssl_backend
  File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/__init__.py", line 7, in <module>
    from cryptography.hazmat.backends.openssl.backend import backend
  File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/backend.py", line 24, in <module>
    from cryptography.hazmat.backends.openssl.ciphers import _CipherContext
  File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/ciphers.py", line 9, in <module>
    from cryptography.hazmat.primitives import ciphers
  File "/---/lib/python3.6/site-packages/cryptography/hazmat/primitives/ciphers/__init__.py", line 7, in <module>
    from cryptography.hazmat.primitives.ciphers.base import (
  File "/---/lib/python3.6/site-packages/cryptography/hazmat/primitives/ciphers/base.py", line 21, in <module>
    class CipherAlgorithm(object):
KeyboardInterrupt
retrieve    ERROR: Download failed (return code 1)
chad-earthscope commented 6 years ago
...
retrieve  DEFAULT: Downloading TA_124A 2010-010 (N_S 3/419; day 10/10)
retrieve  DEFAULT: Downloading TA_125A 2010-001 (N_S 4/419; day 1/10)
retrieve  DEFAULT: Downloading TA_125A 2010-002 (N_S 4/419; day 2/10)
^CTraceback (most recent call last):
Traceback (most recent call last):
  File "/---/bin/rover", line 7, in <module>
  File "/---/bin/rover", line 7, in <module>
    from rover import main
  File "/---/lib/python3.6/site-packages/rover/__init__.py", line 15, in <module>
    from rover import main
  File "/---/lib/python3.6/site-packages/rover/__init__.py", line 15, in <module>
    from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
  File "/---/lib/python3.6/site-packages/rover/args.py", line 20, in <module>
    from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
  File "/---/lib/python3.6/site-packages/rover/args.py", line 20, in <module>
    from .utils import create_parents, canonify, check_cmd
  File "/---/lib/python3.6/site-packages/rover/utils.py", line 28, in <module>
    from requests import Session
    from .utils import create_parents, canonify, check_cmd
  File "/---/lib/python3.6/site-packages/requests/__init__.py", line 44, in <module>
  File "/---/lib/python3.6/site-packages/rover/utils.py", line 28, in <module>
    from requests import Session
  File "/---/lib/python3.6/site-packages/requests/__init__.py", line 44, in <module>
    import chardet
  File "/---/lib/python3.6/site-packages/chardet/__init__.py", line 20, in <module>
    import chardet
  File "/---/lib/python3.6/site-packages/chardet/__init__.py", line 20, in <module>
    from .universaldetector import UniversalDetector
  File "/---/lib/python3.6/site-packages/chardet/universaldetector.py", line 47, in <module>
    from .universaldetector import UniversalDetector
  File "/---/lib/python3.6/site-packages/chardet/universaldetector.py", line 47, in <module>
    from .mbcsgroupprober import MBCSGroupProber
    from .mbcsgroupprober import MBCSGroupProber
  File "/---/lib/python3.6/site-packages/chardet/mbcsgroupprober.py", line 31, in <module>
  File "/---/lib/python3.6/site-packages/chardet/mbcsgroupprober.py", line 36, in <module>
    from .cp949prober import CP949Prober
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
    from .utf8prober import UTF8Prober
  File "/---/lib/python3.6/site-packages/chardet/utf8prober.py", line 31, in <module>
    from .mbcssm import UTF8_SM_MODEL
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 674, in exec_module
  File "<frozen importlib._bootstrap_external>", line 674, in exec_module
  File "<frozen importlib._bootstrap_external>", line 764, in get_code
  File "<frozen importlib._bootstrap_external>", line 758, in get_code
  File "<frozen importlib._bootstrap_external>", line 832, in get_data
  File "<frozen importlib._bootstrap_external>", line 842, in path_stats
KeyboardInterrupt
  File "<frozen importlib._bootstrap_external>", line 82, in _path_stat
KeyboardInterrupt
retrieve    ERROR: Download failed (return code 1)
retrieve    ERROR: Download failed (return code 1)
andrewcooke-isti commented 6 years ago

these are all(?) hitting ctrl-C as soon as you start? i played around and that's the only way i could repeat them (it wasn't til i looked at the stack trace that i worked out that was the possibility).

although because rover starts up processes quite often i guess it could happen at other times too.

anyway, i don't know how to fix these. as far as i can tell they occur when hitting Ctrl-C while Python is still "starting up" and before "my" code "takes control".

one idea is that, since they all appear to be in crypto related calls related to HTTP (request), that we do that import locally (inside the scope of a function). i'll try that and see if it helps. but it's only patching a particular case. in general i don't know how to fix this (although i'll google round).

chad-earthscope commented 6 years ago

these are all(?) hitting ctrl-C as soon as you start?

These are either after waiting 5 seconds, or 20 seconds or 60 second or 120 second, ish. Basically, let it get going and cancel. All of them are after downloads have started.

andrewcooke-isti commented 6 years ago

weird. ok, still looking.

andrewcooke-isti commented 6 years ago

can you try the latest commit (just now) - 789e054?

not trying to be difficult here, but really not seeing this except if i hammer on Ctrl-C at startup.

all the above were while importing request's Session - i've put requests and smtp_llib imports in-line so that any exception thrown can be caught and managed. for me it seems to have made things a little better, but i can still get problems in other libraries. curious how it is for you.

andrewcooke-isti commented 6 years ago

sorry commit 590d9f2

andrewcooke-isti commented 6 years ago

there's some discussion of this problem here. wondering if there's any way to set a different handler.

andrewcooke-isti commented 6 years ago

ok, try f7c12bf - i've replaced the interrupt handler. this fixes things for me and doesn't seem to have any adverse effects, but please don't consider this final - i'm running tests now. submitted in case you're still around and can give feedback.

chad-earthscope commented 6 years ago

At first, from both 789e05464447e9bcb8fc8aa8b9302eb30cfeeae5 and 590d9f26c81ac21f2968b39254d2fb772f030d8b it seemed to have gotten a lot better. But then I got these:

$ ./dev/rover -f dr/rover.config retrieve req.txt 
retrieve  DEFAULT: Rover version 0.0.8plus - starting retrieve
retrieve  DEFAULT: Status available at http://127.0.0.1:8000
retrieve  DEFAULT: Trying new retrieval (attempt 1 of 3)
retrieve  DEFAULT: Downloading TA_129A 2010-001 (N_S 1/412; day 1/9)
retrieve  DEFAULT: Downloading TA_129A 2010-003 (N_S 1/412; day 2/9)
retrieve  DEFAULT: Downloading TA_129A 2010-004 (N_S 1/412; day 3/9)
retrieve  DEFAULT: Downloading TA_129A 2010-005 (N_S 1/412; day 4/9)
retrieve  DEFAULT: Downloading TA_129A 2010-006 (N_S 1/412; day 5/9)
retrieve  DEFAULT: Downloading TA_129A 2010-007 (N_S 1/412; day 6/9)
^CTraceback (most recent call last):
  File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 142, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/Users/chad/code/roverdev/rover/rover/__init__.py", line 7, in <module>
    from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
  File "/Users/chad/code/roverdev/rover/rover/args.py", line 9, in <module>
    from .utils import create_parents, canonify, check_cmd
  File "/Users/chad/code/roverdev/rover/rover/utils.py", line 3, in <module>
    import datetime
  File "/Users/chad/miniconda3/lib/python3.6/datetime.py", line 7, in <module>
    import time as _time
KeyboardInterrupt
retrieve    ERROR: Download failed (return code 1)
$ ./dev/rover -f dr/rover.config retrieve req.txt 
retrieve  DEFAULT: Rover version 0.0.8plus - starting retrieve
retrieve  DEFAULT: Status available at http://127.0.0.1:8000
retrieve  DEFAULT: Trying new retrieval (attempt 1 of 3)
retrieve  DEFAULT: Downloading TA_129A 2010-001 (N_S 1/412; day 1/9)
retrieve  DEFAULT: Downloading TA_129A 2010-003 (N_S 1/412; day 2/9)
retrieve  DEFAULT: Downloading TA_129A 2010-004 (N_S 1/412; day 3/9)
retrieve  DEFAULT: Downloading TA_129A 2010-005 (N_S 1/412; day 4/9)
retrieve  DEFAULT: Downloading TA_129A 2010-006 (N_S 1/412; day 5/9)
retrieve  DEFAULT: Downloading TA_129A 2010-007 (N_S 1/412; day 6/9)
retrieve  DEFAULT: Downloading TA_129A 2010-008 (N_S 1/412; day 7/9)
retrieve  DEFAULT: Downloading TA_129A 2010-009 (N_S 1/412; day 8/9)
retrieve  DEFAULT: Downloading TA_129A 2010-010 (N_S 1/412; day 9/9)
retrieve  DEFAULT: Downloading TA_130A 2010-001 (N_S 2/412; day 1/10)
retrieve  DEFAULT: Downloading TA_130A 2010-002 (N_S 2/412; day 2/10)
retrieve  DEFAULT: Downloading TA_130A 2010-003 (N_S 2/412; day 3/10)
^CTraceback (most recent call last):
  File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 142, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/Users/chad/code/roverdev/rover/rover/__init__.py", line 7, in <module>
    from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
  File "/Users/chad/code/roverdev/rover/rover/args.py", line 3, in <module>
    from argparse import ArgumentParser, Action, RawDescriptionHelpFormatter, SUPPRESS
  File "/Users/chad/miniconda3/lib/python3.6/argparse.py", line 93, in <module>
    from gettext import gettext as _, ngettext
  File "/Users/chad/miniconda3/lib/python3.6/gettext.py", line 49, in <module>
    import locale, copy, io, os, re, struct, sys
  File "/Users/chad/miniconda3/lib/python3.6/locale.py", line 180, in <module>
    _percent_re = re.compile(r'%(?:\((?P<key>.*?)\))?'
  File "/Users/chad/miniconda3/lib/python3.6/re.py", line 233, in compile
    return _compile(pattern, flags)
  File "/Users/chad/miniconda3/lib/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Users/chad/miniconda3/lib/python3.6/sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 765, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 614, in _parse
    if not item or (_len(item) == 1 and item[0][0] is AT):
KeyboardInterrupt
retrieve    ERROR: Download failed (return code 1)
chad-earthscope commented 6 years ago

From f7c12bfab40adfeea818d3417aa3c87d8894b504 I got:

$ ./dev/rover -f dr/rover.config retrieve req.txt 
retrieve  DEFAULT: Rover version 0.0.8plus - starting retrieve
retrieve  DEFAULT: Status available at http://127.0.0.1:8000
retrieve  DEFAULT: Trying new retrieval (attempt 1 of 3)
retrieve  DEFAULT: Downloading TA_214A 2010-008 (N_S 1/406; day 1/2)
retrieve  DEFAULT: Downloading TA_214A 2010-009 (N_S 1/406; day 2/2)
retrieve  DEFAULT: Downloading TA_221A 2010-001 (N_S 2/406; day 1/10)
retrieve  DEFAULT: Downloading TA_221A 2010-002 (N_S 2/406; day 2/10)
retrieve  DEFAULT: Downloading TA_221A 2010-003 (N_S 2/406; day 3/10)
retrieve  DEFAULT: Downloading TA_221A 2010-004 (N_S 2/406; day 4/10)
retrieve  DEFAULT: Downloading TA_221A 2010-005 (N_S 2/406; day 5/10)
^CCould not import runpy module
Traceback (most recent call last):
  File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 14, in <module>
    import importlib.machinery # importlib first so we can test #15386 via -m
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 577, in module_from_spec
  File "<frozen importlib._bootstrap>", line 556, in _init_module_attrs
  File "<frozen importlib._bootstrap>", line 409, in cached
  File "<frozen importlib._bootstrap_external>", line 362, in _get_cached
KeyboardInterrupt
retrieve    ERROR: Download failed (return code 1)
retrieve    ERROR: Download failed (return code -2)

But I had to run & cancel it many times to get that.

andrewcooke-isti commented 6 years ago

ufff. ok, i'll keep poking around. i'll probably back out the earlier fix as the interrupt-based solutions feels "better". what you have now seems to be from before python even gets to that (while starting a subprocess).

also, i think i may not be seeing this because my computer is "too good"(!) - likely has everything in ram cache. i will look at forcing it to flush that so that it's slower when starting processes. that may help me reproduce this.

also, robot tests are failing because of an unexpected version format, by the looks of things - will improve those too.

but tomorrow.

chad-earthscope commented 6 years ago

also, robot tests are failing because of an unexpected version format, by the looks of things - will improve those too.

but tomorrow.

Yep, cool.

The version strategy now is to set the version on new releases, tag it, then set it to "#.#.#plus" so that anyone that's using the code from master (instead of a tagged release) has a useful version number.

andrewcooke-isti commented 6 years ago

i've pushed changes to clean up the interrupt work and fix the tests.

chad-earthscope commented 6 years ago

Closing, thanks.

andrewcooke-isti commented 5 years ago

(i did ask around and the only fix anyone could think of was to write a C wrapper for the whole system, to catch the interrupt, which i imagine would over-complicate package distribution. it was also felt to be beyond the scope of the project...)