LorenFrankLab / spyglass

Neuroscience data analysis framework for reproducible research built by Loren Frank Lab at UCSF
https://lorenfranklab.github.io/spyglass/
MIT License
92 stars 42 forks source link

Tests randomly have timeout errors when downloading test file #86

Closed rly closed 2 years ago

rly commented 3 years ago

In the CI test suite, we use kachery to download a test file https://github.com/LorenFrankLab/nwb_datajoint/blob/bd2e0e6be22b5c52cec020eaee3d983376501c66/tests/test_1.py#L18-L20

This was working fine for weeks, but in the last 24 hours, the test has failed a couple times, randomly, due to a TimeoutError when downloading the file using kachery > urllib.

Re-running the test often resolves the issue.

@jsoules do you have any ideas of what might be wrong? Is the kachery server intermittently down such that the CI gets a timeout? I have tried running this same code to download the test file on my local machine, hundreds of times in a for loop, and have not been able to reproduce the timeout error.

rly commented 3 years ago

It turns out that re-running the test makes the old error log difficult to find so I have copied it here:

Click to expand! ``` =================================== FAILURES =================================== ____________________________________ test_1 ____________________________________ self = http_class = req = , http_conn_args = {} host = 'kachery.flatironinstitute.org' h = def do_open(self, http_class, req, **http_conn_args): """Return an HTTPResponse object for the request, using http_class. http_class must implement the HTTPConnection API from http.client. """ host = req.host if not host: raise URLError('no host given') # will parse host:port h = http_class(host, timeout=req.timeout, **http_conn_args) h.set_debuglevel(self._debuglevel) headers = dict(req.unredirected_hdrs) headers.update({k: v for k, v in req.headers.items() if k not in headers}) # TODO(jhylton): Should this be redesigned to handle # persistent connections? # We want to make an HTTP/1.1 request, but the addinfourl # class isn't prepared to deal with a persistent connection. # It will try to read all remaining data from the socket, # which will block while the server waits for the next request. # So make sure the connection gets closed after the (only) # request. headers["Connection"] = "close" headers = {name.title(): val for name, val in headers.items()} if req._tunnel_host: tunnel_headers = {} proxy_auth_hdr = "Proxy-Authorization" if proxy_auth_hdr in headers: tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr] # Proxy-Authorization should not be sent to origin # server. del headers[proxy_auth_hdr] h.set_tunnel(req._tunnel_host, headers=tunnel_headers) try: try: > h.request(req.get_method(), req.selector, req.data, headers, encode_chunked=req.has_header('Transfer-encoding')) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/urllib/request.py:1354: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , method = 'GET' url = '/neuro1/get/sha1/8ed68285c327b3766402ee75730d87994ac87e87?channel=public&signature=bd9e96f7035ffc13d017cb187ce04b26e6a526a6' body = None headers = {'Connection': 'close', 'Host': 'kachery.flatironinstitute.org', 'User-Agent': 'Python-urllib/3.8'} def request(self, method, url, body=None, headers={}, *, encode_chunked=False): """Send a complete request to the server.""" > self._send_request(method, url, body, headers, encode_chunked) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/http/client.py:1256: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , method = 'GET' url = '/neuro1/get/sha1/8ed68285c327b3766402ee75730d87994ac87e87?channel=public&signature=bd9e96f7035ffc13d017cb187ce04b26e6a526a6' body = None headers = {'Connection': 'close', 'Host': 'kachery.flatironinstitute.org', 'User-Agent': 'Python-urllib/3.8'} encode_chunked = False def _send_request(self, method, url, body, headers, encode_chunked): # Honor explicitly requested Host: and Accept-Encoding: headers. header_names = frozenset(k.lower() for k in headers) skips = {} if 'host' in header_names: skips['skip_host'] = 1 if 'accept-encoding' in header_names: skips['skip_accept_encoding'] = 1 self.putrequest(method, url, **skips) # chunked encoding will happen if HTTP/1.1 is used and either # the caller passes encode_chunked=True or the following # conditions hold: # 1. content-length has not been explicitly set # 2. the body is a file or iterable, but not a str or bytes-like # 3. Transfer-Encoding has NOT been explicitly set by the caller if 'content-length' not in header_names: # only chunk body if not explicitly set for backwards # compatibility, assuming the client code is already handling the # chunking if 'transfer-encoding' not in header_names: # if content-length cannot be automatically determined, fall # back to chunked encoding encode_chunked = False content_length = self._get_content_length(body, method) if content_length is None: if body is not None: if self.debuglevel > 0: print('Unable to determine size of %r' % body) encode_chunked = True self.putheader('Transfer-Encoding', 'chunked') else: self.putheader('Content-Length', str(content_length)) else: encode_chunked = False for hdr, value in headers.items(): self.putheader(hdr, value) if isinstance(body, str): # RFC 2616 Section 3.7.1 says that text default has a # default charset of iso-8859-1. body = _encode(body, 'body') > self.endheaders(body, encode_chunked=encode_chunked) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/http/client.py:1302: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = message_body = None def endheaders(self, message_body=None, *, encode_chunked=False): """Indicate that the last header line has been sent to the server. This method sends the request to the server. The optional message_body argument can be used to pass a message body associated with the request. """ if self.__state == _CS_REQ_STARTED: self.__state = _CS_REQ_SENT else: raise CannotSendHeader() > self._send_output(message_body, encode_chunked=encode_chunked) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/http/client.py:1251: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = message_body = None, encode_chunked = False def _send_output(self, message_body=None, encode_chunked=False): """Send the currently buffered request and clear the buffer. Appends an extra \\r\\n to the buffer. A message_body may be specified, to be appended to the request. """ self._buffer.extend((b"", b"")) msg = b"\r\n".join(self._buffer) del self._buffer[:] > self.send(msg) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/http/client.py:1011: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = data = b'GET /neuro1/get/sha1/8ed68285c327b3766402ee75730d87994ac87e87?channel=public&signature=bd9e96f7035ffc13d017cb187ce04...ncoding: identity\r\nHost: kachery.flatironinstitute.org\r\nUser-Agent: Python-urllib/3.8\r\nConnection: close\r\n\r\n' def send(self, data): """Send `data' to the server. ``data`` can be a string object, a bytes object, an array object, a file-like object that supports a .read() method, or an iterable object. """ if self.sock is None: if self.auto_open: > self.connect() /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/http/client.py:951: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = def connect(self): """Connect to the host and port specified in __init__.""" + docker kill datajoint-server-pytest > self.sock = self._create_connection( (self.host,self.port), self.timeout, self.source_address) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/http/client.py:922: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ address = ('kachery.flatironinstitute.org', 80) timeout = , source_address = None def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT, source_address=None): """Connect to *address* and return the socket object. Convenience function. Connect to *address* (a 2-tuple ``(host, port)``) and return the socket object. Passing the optional *timeout* parameter will set the timeout on the socket instance before attempting to connect. If no *timeout* is supplied, the global default timeout setting returned by :func:`getdefaulttimeout` is used. If *source_address* is set it must be a tuple of (host, port) for the socket to bind as a source address before making the connection. A host of '' or port 0 tells the OS to use the default. """ host, port = address err = None for res in getaddrinfo(host, port, 0, SOCK_STREAM): af, socktype, proto, canonname, sa = res sock = None try: sock = socket(af, socktype, proto) if timeout is not _GLOBAL_DEFAULT_TIMEOUT: sock.settimeout(timeout) if source_address: sock.bind(source_address) sock.connect(sa) # Break explicitly a reference cycle err = None return sock except error as _: err = _ if sock is not None: sock.close() if err is not None: try: > raise err /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/socket.py:808: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ address = ('kachery.flatironinstitute.org', 80) timeout = , source_address = None def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT, source_address=None): """Connect to *address* and return the socket object. Convenience function. Connect to *address* (a 2-tuple ``(host, port)``) and return the socket object. Passing the optional *timeout* parameter will set the timeout on the socket instance before attempting to connect. If no *timeout* is supplied, the global default timeout setting returned by :func:`getdefaulttimeout` is used. If *source_address* is set it must be a tuple of (host, port) for the socket to bind as a source address before making the connection. A host of '' or port 0 tells the OS to use the default. """ host, port = address err = None for res in getaddrinfo(host, port, 0, SOCK_STREAM): af, socktype, proto, canonname, sa = res sock = None try: sock = socket(af, socktype, proto) if timeout is not _GLOBAL_DEFAULT_TIMEOUT: sock.settimeout(timeout) if source_address: sock.bind(source_address) > sock.connect(sa) E TimeoutError: [Errno 110] Connection timed out /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/socket.py:796: TimeoutError During handling of the above exception, another exception occurred: def test_1(): print("In test_1, os.environ['NWB_DATAJOINT_BASE_DIR'] is", os.environ['NWB_DATAJOINT_BASE_DIR']) raw_dir = pathlib.Path(os.environ['NWB_DATAJOINT_BASE_DIR']) / 'raw' nwbfile_path = raw_dir / 'test.nwb' from nwb_datajoint.common import Session, DataAcquisitionDevice, CameraDevice, Probe from nwb_datajoint.data_import import insert_sessions with ka.config(fr='default_readonly'): > ka.load_file('sha1://8ed68285c327b3766402ee75730d87994ac87e87/beans20190718_no_eseries_no_behavior.nwb', dest=str(nwbfile_path)) tests/test_1.py:19: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/site-packages/kachery/core.py:230: in load_file return _hash_caches[algorithm].downloadFile(url=url0, hash=hash0, size=size0, target_path=dest) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/site-packages/kachery/localhashcache.py:130: in downloadFile hash_b = steady_download_and_compute_hash(url=url, algorithm=self._algorithm, target_path=path_tmp) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/site-packages/kachery/steady_download_and_compute_hash.py:11: in steady_download_and_compute_hash remote = urllib.request.urlopen(url) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/urllib/request.py:222: in urlopen return opener.open(url, data, timeout) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/urllib/request.py:525: in open response = self._open(req, data) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/urllib/request.py:542: in _open result = self._call_chain(self.handle_open, protocol, protocol + /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/urllib/request.py:502: in _call_chain result = func(*args) /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/urllib/request.py:1383: in http_open return self.do_open(http.client.HTTPConnection, req) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = http_class = req = , http_conn_args = {} host = 'kachery.flatironinstitute.org' h = def do_open(self, http_class, req, **http_conn_args): """Return an HTTPResponse object for the request, using http_class. http_class must implement the HTTPConnection API from http.client. """ host = req.host if not host: raise URLError('no host given') # will parse host:port h = http_class(host, timeout=req.timeout, **http_conn_args) h.set_debuglevel(self._debuglevel) headers = dict(req.unredirected_hdrs) headers.update({k: v for k, v in req.headers.items() if k not in headers}) # TODO(jhylton): Should this be redesigned to handle # persistent connections? # We want to make an HTTP/1.1 request, but the addinfourl # class isn't prepared to deal with a persistent connection. # It will try to read all remaining data from the socket, # which will block while the server waits for the next request. # So make sure the connection gets closed after the (only) # request. headers["Connection"] = "close" headers = {name.title(): val for name, val in headers.items()} if req._tunnel_host: tunnel_headers = {} proxy_auth_hdr = "Proxy-Authorization" if proxy_auth_hdr in headers: tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr] # Proxy-Authorization should not be sent to origin # server. del headers[proxy_auth_hdr] h.set_tunnel(req._tunnel_host, headers=tunnel_headers) try: try: h.request(req.get_method(), req.selector, req.data, headers, encode_chunked=req.has_header('Transfer-encoding')) except OSError as err: # timeout error > raise URLError(err) E urllib.error.URLError: /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/urllib/request.py:1357: URLError datajoint-server-pytest + docker rm datajoint-server-pytest datajoint-server-pytest ----------------------------- Captured stdout call ----------------------------- In test_1, os.environ['NWB_DATAJOINT_BASE_DIR'] is /tmp/tmpvr094zgf Downloading file --- (381.2 KiB): http://kachery.flatironinstitute.org/neuro1/get/sha1/8ed68285c327b3766402ee75730d87994ac87e87?channel=public&signature=bd9e96f7035ffc13d017cb187ce04b26e6a526a6 -> /tmp/tmpvr094zgf/raw/test.nwb =============================== warnings summary =============================== ../../../../../usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/site-packages/datajoint/settings.py:61 /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/site-packages/datajoint/settings.py:61: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working class Config(collections.MutableMapping): ../../../../../usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/site-packages/datajoint/external.py:2 /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/site-packages/datajoint/external.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working from collections import Mapping tests/data_import/test_insert_sessions.py: 24 warnings /usr/share/miniconda/envs/nwb_datajoint/lib/python3.8/site-packages/h5py/_hl/dataset.py:541: DeprecationWarning: Passing None into shape arguments as an alias for () is deprecated. arr = numpy.ndarray(selection.mshape, dtype=new_dtype) -- Docs: docs.pytest.org/en/stable/warnings.html ==================================== PASSES ==================================== ________________________________ test_copy_nwb _________________________________ ------------------------------ Captured log setup ------------------------------ INFO datajoint.settings:settings.py:206 Setting stores to {'raw': {'protocol': 'file', 'location': '/tmp/pytest-of-runner/pytest-0/test_copy_nwb0/nwb-data/raw', 'stage': '/tmp/pytest-of-runner/pytest-0/test_copy_nwb0/nwb-data/raw'}} ----------------------------- Captured stdout call ----------------------------- Creating a copy of NWB file raw.nwb with link to raw ephys data: raw_no_ephys.nwb ============= 1 failed, 3 passed, 26 warnings in 135.18s (0:02:15) ============= Terminating datajoint compute resource process Terminating datajoint server :219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 80 from C header, got 88 from PyObject Error: Process completed with exit code 1. ```
jsoules commented 3 years ago

This occurred within the last 24 hours? What channel are you downloading the test file from?

We had a power transformer explode in front of the building on Thursday afternoon and had to bring down the cluster and all workstations here as a precautionary measure. For at least some channels, that meant there were no longer any peers online that actually had the files, until around 11 AM EDT yesterday when we brought things back up.

Ryan, can you confirm whether the tests in question are downloading files provided by a peer you control? If possible it’d be nice to eliminate remote server status as a variable for the test in the future.

rly commented 3 years ago

Thanks for the quick response @jsoules ! The timeout errors occurred around 1pm and 8pm ET on Friday. I believe Jeremy set up this test initially and did not configure pulling the file from a particular channel so I assume it is being hosted only at the Flatiron Institute. The error has not returned in the last three nightly CI builds, so probably it was a temporary issue related to the Flatiron cluster shutdown.

In any case, I agree, the file should be hosted by a kachery server that we control. I'll look into that.

lfrank commented 2 years ago

I believe this has been fixed.