Closed chad-earthscope closed 6 years ago
The "inability to restart" aspect of this issue has been moved to #47.
The original reported behavior was on CentOS (Linux). I tried to duplicate this on macOS and a single Control-C (SIGINT) quickly shut down rover while it was doing multiple retrieves. Trying again in Linux, it quickly shut down but once left rover in a non-restartable state due to empty files, leading to #48.
Thus far I have not been able to reproduce the long, drawn out wait for a shutdown, perhaps that was an anomaly.
Regardless, the shutdown is never graceful as there is always at least one Traceback, in some cases many of them, dumped to the console. Users should not be shown a Traceback when shutting down a rover retrieve in the most common way one would shutdown a program run on the command line.
Fixed in 3d113c8
More ungraceful stack traces, but no delays:
...
retrieve DEFAULT: Downloading TA_113A 2010-008 (N_S 1/420; day 8/10)
retrieve DEFAULT: Downloading TA_113A 2010-009 (N_S 1/420; day 9/10)
retrieve DEFAULT: Downloading TA_113A 2010-010 (N_S 1/420; day 10/10)
^CTraceback (most recent call last):
File "/---/bin/rover", line 7, in <module>
from rover import main
File "/---/lib/python3.6/site-packages/rover/__init__.py", line 15, in <module>
from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
File "/---/lib/python3.6/site-packages/rover/args.py", line 20, in <module>
from .utils import create_parents, canonify, check_cmd
File "/---/lib/python3.6/site-packages/rover/utils.py", line 28, in <module>
from requests import Session
File "/---/lib/python3.6/site-packages/requests/__init__.py", line 84, in <module>
from urllib3.contrib import pyopenssl
File "/---/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 48, in <module>
from cryptography.hazmat.backends.openssl import backend as openssl_backend
File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/__init__.py", line 7, in <module>
from cryptography.hazmat.backends.openssl.backend import backend
File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/backend.py", line 59, in <module>
from cryptography.hazmat.primitives.ciphers.algorithms import (
File "/---/lib/python3.6/site-packages/cryptography/hazmat/primitives/ciphers/algorithms.py", line 25, in <module>
class AES(object):
KeyboardInterrupt
retrieve ERROR: Download failed (return code 1)
...
retrieve DEFAULT: Downloading TA_121A 2010-001 (N_S 2/420; day 1/10)
retrieve DEFAULT: Downloading TA_121A 2010-002 (N_S 2/420; day 2/10)
retrieve DEFAULT: Downloading TA_121A 2010-003 (N_S 2/420; day 3/10)
retrieve DEFAULT: Downloading TA_121A 2010-004 (N_S 2/420; day 4/10)
^CTraceback (most recent call last):
File "/---/bin/rover", line 7, in <module>
from rover import main
File "/---/lib/python3.6/site-packages/rover/__init__.py", line 15, in <module>
from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
File "/---/lib/python3.6/site-packages/rover/args.py", line 20, in <module>
from .utils import create_parents, canonify, check_cmd
File "/---/lib/python3.6/site-packages/rover/utils.py", line 28, in <module>
from requests import Session
File "/---/lib/python3.6/site-packages/requests/__init__.py", line 84, in <module>
from urllib3.contrib import pyopenssl
File "/---/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 48, in <module>
from cryptography.hazmat.backends.openssl import backend as openssl_backend
File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/__init__.py", line 7, in <module>
from cryptography.hazmat.backends.openssl.backend import backend
File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/backend.py", line 24, in <module>
from cryptography.hazmat.backends.openssl.ciphers import _CipherContext
File "/---/lib/python3.6/site-packages/cryptography/hazmat/backends/openssl/ciphers.py", line 9, in <module>
from cryptography.hazmat.primitives import ciphers
File "/---/lib/python3.6/site-packages/cryptography/hazmat/primitives/ciphers/__init__.py", line 7, in <module>
from cryptography.hazmat.primitives.ciphers.base import (
File "/---/lib/python3.6/site-packages/cryptography/hazmat/primitives/ciphers/base.py", line 21, in <module>
class CipherAlgorithm(object):
KeyboardInterrupt
retrieve ERROR: Download failed (return code 1)
...
retrieve DEFAULT: Downloading TA_124A 2010-010 (N_S 3/419; day 10/10)
retrieve DEFAULT: Downloading TA_125A 2010-001 (N_S 4/419; day 1/10)
retrieve DEFAULT: Downloading TA_125A 2010-002 (N_S 4/419; day 2/10)
^CTraceback (most recent call last):
Traceback (most recent call last):
File "/---/bin/rover", line 7, in <module>
File "/---/bin/rover", line 7, in <module>
from rover import main
File "/---/lib/python3.6/site-packages/rover/__init__.py", line 15, in <module>
from rover import main
File "/---/lib/python3.6/site-packages/rover/__init__.py", line 15, in <module>
from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
File "/---/lib/python3.6/site-packages/rover/args.py", line 20, in <module>
from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
File "/---/lib/python3.6/site-packages/rover/args.py", line 20, in <module>
from .utils import create_parents, canonify, check_cmd
File "/---/lib/python3.6/site-packages/rover/utils.py", line 28, in <module>
from requests import Session
from .utils import create_parents, canonify, check_cmd
File "/---/lib/python3.6/site-packages/requests/__init__.py", line 44, in <module>
File "/---/lib/python3.6/site-packages/rover/utils.py", line 28, in <module>
from requests import Session
File "/---/lib/python3.6/site-packages/requests/__init__.py", line 44, in <module>
import chardet
File "/---/lib/python3.6/site-packages/chardet/__init__.py", line 20, in <module>
import chardet
File "/---/lib/python3.6/site-packages/chardet/__init__.py", line 20, in <module>
from .universaldetector import UniversalDetector
File "/---/lib/python3.6/site-packages/chardet/universaldetector.py", line 47, in <module>
from .universaldetector import UniversalDetector
File "/---/lib/python3.6/site-packages/chardet/universaldetector.py", line 47, in <module>
from .mbcsgroupprober import MBCSGroupProber
from .mbcsgroupprober import MBCSGroupProber
File "/---/lib/python3.6/site-packages/chardet/mbcsgroupprober.py", line 31, in <module>
File "/---/lib/python3.6/site-packages/chardet/mbcsgroupprober.py", line 36, in <module>
from .cp949prober import CP949Prober
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
from .utf8prober import UTF8Prober
File "/---/lib/python3.6/site-packages/chardet/utf8prober.py", line 31, in <module>
from .mbcssm import UTF8_SM_MODEL
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 674, in exec_module
File "<frozen importlib._bootstrap_external>", line 674, in exec_module
File "<frozen importlib._bootstrap_external>", line 764, in get_code
File "<frozen importlib._bootstrap_external>", line 758, in get_code
File "<frozen importlib._bootstrap_external>", line 832, in get_data
File "<frozen importlib._bootstrap_external>", line 842, in path_stats
KeyboardInterrupt
File "<frozen importlib._bootstrap_external>", line 82, in _path_stat
KeyboardInterrupt
retrieve ERROR: Download failed (return code 1)
retrieve ERROR: Download failed (return code 1)
these are all(?) hitting ctrl-C as soon as you start? i played around and that's the only way i could repeat them (it wasn't til i looked at the stack trace that i worked out that was the possibility).
although because rover starts up processes quite often i guess it could happen at other times too.
anyway, i don't know how to fix these. as far as i can tell they occur when hitting Ctrl-C while Python is still "starting up" and before "my" code "takes control".
one idea is that, since they all appear to be in crypto related calls related to HTTP (request), that we do that import locally (inside the scope of a function). i'll try that and see if it helps. but it's only patching a particular case. in general i don't know how to fix this (although i'll google round).
these are all(?) hitting ctrl-C as soon as you start?
These are either after waiting 5 seconds, or 20 seconds or 60 second or 120 second, ish. Basically, let it get going and cancel. All of them are after downloads have started.
weird. ok, still looking.
can you try the latest commit (just now) - 789e054?
not trying to be difficult here, but really not seeing this except if i hammer on Ctrl-C at startup.
all the above were while importing request's Session - i've put requests and smtp_llib imports in-line so that any exception thrown can be caught and managed. for me it seems to have made things a little better, but i can still get problems in other libraries. curious how it is for you.
sorry commit 590d9f2
there's some discussion of this problem here. wondering if there's any way to set a different handler.
ok, try f7c12bf - i've replaced the interrupt handler. this fixes things for me and doesn't seem to have any adverse effects, but please don't consider this final - i'm running tests now. submitted in case you're still around and can give feedback.
At first, from both 789e05464447e9bcb8fc8aa8b9302eb30cfeeae5 and 590d9f26c81ac21f2968b39254d2fb772f030d8b it seemed to have gotten a lot better. But then I got these:
$ ./dev/rover -f dr/rover.config retrieve req.txt
retrieve DEFAULT: Rover version 0.0.8plus - starting retrieve
retrieve DEFAULT: Status available at http://127.0.0.1:8000
retrieve DEFAULT: Trying new retrieval (attempt 1 of 3)
retrieve DEFAULT: Downloading TA_129A 2010-001 (N_S 1/412; day 1/9)
retrieve DEFAULT: Downloading TA_129A 2010-003 (N_S 1/412; day 2/9)
retrieve DEFAULT: Downloading TA_129A 2010-004 (N_S 1/412; day 3/9)
retrieve DEFAULT: Downloading TA_129A 2010-005 (N_S 1/412; day 4/9)
retrieve DEFAULT: Downloading TA_129A 2010-006 (N_S 1/412; day 5/9)
retrieve DEFAULT: Downloading TA_129A 2010-007 (N_S 1/412; day 6/9)
^CTraceback (most recent call last):
File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 142, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 109, in _get_module_details
__import__(pkg_name)
File "/Users/chad/code/roverdev/rover/rover/__init__.py", line 7, in <module>
from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
File "/Users/chad/code/roverdev/rover/rover/args.py", line 9, in <module>
from .utils import create_parents, canonify, check_cmd
File "/Users/chad/code/roverdev/rover/rover/utils.py", line 3, in <module>
import datetime
File "/Users/chad/miniconda3/lib/python3.6/datetime.py", line 7, in <module>
import time as _time
KeyboardInterrupt
retrieve ERROR: Download failed (return code 1)
$ ./dev/rover -f dr/rover.config retrieve req.txt
retrieve DEFAULT: Rover version 0.0.8plus - starting retrieve
retrieve DEFAULT: Status available at http://127.0.0.1:8000
retrieve DEFAULT: Trying new retrieval (attempt 1 of 3)
retrieve DEFAULT: Downloading TA_129A 2010-001 (N_S 1/412; day 1/9)
retrieve DEFAULT: Downloading TA_129A 2010-003 (N_S 1/412; day 2/9)
retrieve DEFAULT: Downloading TA_129A 2010-004 (N_S 1/412; day 3/9)
retrieve DEFAULT: Downloading TA_129A 2010-005 (N_S 1/412; day 4/9)
retrieve DEFAULT: Downloading TA_129A 2010-006 (N_S 1/412; day 5/9)
retrieve DEFAULT: Downloading TA_129A 2010-007 (N_S 1/412; day 6/9)
retrieve DEFAULT: Downloading TA_129A 2010-008 (N_S 1/412; day 7/9)
retrieve DEFAULT: Downloading TA_129A 2010-009 (N_S 1/412; day 8/9)
retrieve DEFAULT: Downloading TA_129A 2010-010 (N_S 1/412; day 9/9)
retrieve DEFAULT: Downloading TA_130A 2010-001 (N_S 2/412; day 1/10)
retrieve DEFAULT: Downloading TA_130A 2010-002 (N_S 2/412; day 2/10)
retrieve DEFAULT: Downloading TA_130A 2010-003 (N_S 2/412; day 3/10)
^CTraceback (most recent call last):
File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 142, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 109, in _get_module_details
__import__(pkg_name)
File "/Users/chad/code/roverdev/rover/rover/__init__.py", line 7, in <module>
from .args import INIT_REPOSITORY, INDEX, INGEST, LIST_INDEX, \
File "/Users/chad/code/roverdev/rover/rover/args.py", line 3, in <module>
from argparse import ArgumentParser, Action, RawDescriptionHelpFormatter, SUPPRESS
File "/Users/chad/miniconda3/lib/python3.6/argparse.py", line 93, in <module>
from gettext import gettext as _, ngettext
File "/Users/chad/miniconda3/lib/python3.6/gettext.py", line 49, in <module>
import locale, copy, io, os, re, struct, sys
File "/Users/chad/miniconda3/lib/python3.6/locale.py", line 180, in <module>
_percent_re = re.compile(r'%(?:\((?P<key>.*?)\))?'
File "/Users/chad/miniconda3/lib/python3.6/re.py", line 233, in compile
return _compile(pattern, flags)
File "/Users/chad/miniconda3/lib/python3.6/re.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "/Users/chad/miniconda3/lib/python3.6/sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 765, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/Users/chad/miniconda3/lib/python3.6/sre_parse.py", line 614, in _parse
if not item or (_len(item) == 1 and item[0][0] is AT):
KeyboardInterrupt
retrieve ERROR: Download failed (return code 1)
From f7c12bfab40adfeea818d3417aa3c87d8894b504 I got:
$ ./dev/rover -f dr/rover.config retrieve req.txt
retrieve DEFAULT: Rover version 0.0.8plus - starting retrieve
retrieve DEFAULT: Status available at http://127.0.0.1:8000
retrieve DEFAULT: Trying new retrieval (attempt 1 of 3)
retrieve DEFAULT: Downloading TA_214A 2010-008 (N_S 1/406; day 1/2)
retrieve DEFAULT: Downloading TA_214A 2010-009 (N_S 1/406; day 2/2)
retrieve DEFAULT: Downloading TA_221A 2010-001 (N_S 2/406; day 1/10)
retrieve DEFAULT: Downloading TA_221A 2010-002 (N_S 2/406; day 2/10)
retrieve DEFAULT: Downloading TA_221A 2010-003 (N_S 2/406; day 3/10)
retrieve DEFAULT: Downloading TA_221A 2010-004 (N_S 2/406; day 4/10)
retrieve DEFAULT: Downloading TA_221A 2010-005 (N_S 2/406; day 5/10)
^CCould not import runpy module
Traceback (most recent call last):
File "/Users/chad/miniconda3/lib/python3.6/runpy.py", line 14, in <module>
import importlib.machinery # importlib first so we can test #15386 via -m
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
File "<frozen importlib._bootstrap>", line 577, in module_from_spec
File "<frozen importlib._bootstrap>", line 556, in _init_module_attrs
File "<frozen importlib._bootstrap>", line 409, in cached
File "<frozen importlib._bootstrap_external>", line 362, in _get_cached
KeyboardInterrupt
retrieve ERROR: Download failed (return code 1)
retrieve ERROR: Download failed (return code -2)
But I had to run & cancel it many times to get that.
ufff. ok, i'll keep poking around. i'll probably back out the earlier fix as the interrupt-based solutions feels "better". what you have now seems to be from before python even gets to that (while starting a subprocess).
also, i think i may not be seeing this because my computer is "too good"(!) - likely has everything in ram cache. i will look at forcing it to flush that so that it's slower when starting processes. that may help me reproduce this.
also, robot tests are failing because of an unexpected version format, by the looks of things - will improve those too.
but tomorrow.
also, robot tests are failing because of an unexpected version format, by the looks of things - will improve those too.
but tomorrow.
Yep, cool.
The version strategy now is to set the version on new releases, tag it, then set it to "#.#.#plus" so that anyone that's using the code from master (instead of a tagged release) has a useful version number.
i've pushed changes to clean up the interrupt work and fix the tests.
Closing, thanks.
(i did ask around and the only fix anyone could think of was to write a C wrapper for the whole system, to catch the interrupt, which i imagine would over-complicate package distribution. it was also felt to be beyond the scope of the project...)
After a service outage at the DMC, a long running rover retrieve was spinning out errors, such as:
After pressing Control-C on the console, nothing apparently happened, if there was a message on the screen it went by quickly.
Pressing Control-C many more times resulting in a numerous Trackbacks, ala:
Finally, after some a minute or so, rover quit.
The
TA_2012.req
contains:Rover needs to be interrupt-able and restartable as it is expected to run for long periods of time.