dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
22 stars 27 forks source link

Running tests in non-tox environment resulting in peer reset #1405

Closed CodyCBakerPhD closed 9 months ago

CodyCBakerPhD commented 9 months ago

attn @jwodder @yarikoptic @TheChymera

Looking for advice to fix some failures running dandi-cli testing suite in dev mode on the NWB Inspector after adjusting the strategy

We originally followed the developer instructions here: https://github.com/dandi/dandi-cli/blob/master/DEVELOPMENT.md#running-tests-locally

With the goal of running the dandi-cli testing suite against a dev branch of the NWB Inspector to ensure there are no unexpected failures prior to cutting a new release

But @jwodder pointed out that the tox was creating its own isolated virtual environment that did not depend on an editable local version of the Inspector, hence we switched from tox to direct invocation of pytest: https://github.com/NeurodataWithoutBorders/nwbinspector/blob/dev/.github/workflows/dandi-dev.yml#L21-L35

This worked well for a week or so, but the past couple of days have resulted in the errors seen in these logs: https://github.com/NeurodataWithoutBorders/nwbinspector/actions/runs/7889319139/job/21531401494

which are all of the form

During handling of the above exception, another exception occurred:
venvs/dev3/lib/python3.9/site-packages/requests/adapters.py:486: in send
    resp = conn.urlopen(
venvs/dev3/lib/python3.9/site-packages/urllib3/connectionpool.py:799: in urlopen
    retries = retries.increment(
venvs/dev3/lib/python3.9/site-packages/urllib3/util/retry.py:550: in increment
    raise six.reraise(type(error), error, _stacktrace)
venvs/dev3/lib/python3.9/site-packages/urllib3/packages/six.py:769: in reraise
    raise value.with_traceback(tb)
venvs/dev3/lib/python3.9/site-packages/urllib3/connectionpool.py:715: in urlopen
    httplib_response = self._make_request(
venvs/dev3/lib/python3.9/site-packages/urllib3/connectionpool.py:467: in _make_request
    six.raise_from(e, None)
venvs/dev3/lib/python3.9/site-packages/urllib3/connectionpool.py:462: in _make_request
    httplib_response = conn.getresponse()
/usr/share/miniconda/envs/__setup_conda/lib/python3.9/http/client.py:1377: in getresponse
    response.begin()
/usr/share/miniconda/envs/__setup_conda/lib/python3.9/http/client.py:320: in begin
    version, status, reason = self._read_status()
/usr/share/miniconda/envs/__setup_conda/lib/python3.9/http/client.py:281: in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
/usr/share/miniconda/envs/__setup_conda/lib/python3.9/socket.py:704: in readinto
    return self._sock.recv_into(b)
E   urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Any ideas how to fix this? Attempting to retry it several times did not seem to magically resolve

jwodder commented 9 months ago

@CodyCBakerPhD I don't believe the failures have anything to do with tox vs. non-tox, unless you happen to have an http_proxy envvar or similar set in your test environment. The linked failures all involve connection errors to http://purl.obolibrary.org, so I assume that site is simply having issues.

CodyCBakerPhD commented 9 months ago

OK, so you suggest retrying after a couple of hours?

Is there a flag, environment variable, or other mode of running the dandi testing suite to suppress the running of external network dependent tests like that?

Just noting I don't recall ever seeing things like this prior to the recent changes (and this CI has run daily for past couple of years); might totally coincidental though

jwodder commented 9 months ago

@CodyCBakerPhD If DANDI_TESTS_NONETWORK is set to a nonempty string, all network-accessing tests will be skipped. Note that this includes everything that uses the Archive Docker image, so you'll be skipping a lot of tests.

CodyCBakerPhD commented 9 months ago

OK, opened https://github.com/NeurodataWithoutBorders/nwbinspector/pull/435 with a strategy that splits the network-dependent ones from the non-dependent ones and looking at what tests were not skipped in the latter case (https://github.com/NeurodataWithoutBorders/nwbinspector/actions/runs/7904129986/job/21573692838?pr=435) I'd say it covers a lot (but not all) of the DANDI tests that might be affected by the Inspector

Thanks for the pointer; I do still (even as of today) see the same connection reset issue on our side. I'll try adjusting that specific workflow more to mimic your own https://github.com/dandi/dandi-cli/blob/master/.github/workflows/test.yml