Failing aarch tests due to s3_server fixture / minio_server_health_check

h-vetinari commented 6 months ago

This has been going on for a while; when we catch a slow agent, the tests on aarch fail as follows:

=========================== short test summary info ============================
ERROR pyarrow/tests/parquet/test_dataset.py::test_read_s3fs - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_dataset.py::test_read_directory_s3fs - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_dataset.py::test_read_partitioned_directory_s3fs - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_pathlib_nonlocal - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions_s3fs - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_no_partitions_s3fs - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_pickling[builtin_pickle-S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_pickling[builtin_pickle-PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_is_functional_after_pickling[builtin_pickle-S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_is_functional_after_pickling[builtin_pickle-PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_pickling[cloudpickle-S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_pickling[cloudpickle-PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_is_functional_after_pickling[cloudpickle-S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_is_functional_after_pickling[cloudpickle-PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_metadata.py::test_write_metadata_fs_file_combinations - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_parquet_file.py::test_parquet_file_with_filesystem[True] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_parquet_file.py::test_parquet_file_with_filesystem[False] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_parquet_writer.py::test_parquet_writer_filesystem_s3 - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_parquet_writer.py::test_parquet_writer_filesystem_s3_uri - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/parquet/test_parquet_writer.py::test_parquet_writer_filesystem_s3fs - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_dataset.py::test_make_fragment_with_size - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[threaded] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3[serial] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_dataset.py::test_open_dataset_from_fileinfos[threaded] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_dataset.py::test_open_dataset_from_fileinfos[serial] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_dataset.py::test_open_dataset_from_uri_s3_fsspec - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_dataset.py::test_open_dataset_from_s3_with_filesystem_uri - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_dataset.py::test_write_dataset_s3 - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_dataset.py::test_write_dataset_s3_put_only - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_s3fs_limited_permissions_create_bucket - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_equals_none[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_equals_none[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_normalize_path[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_normalize_path[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_non_path_like_input_raises[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_non_path_like_input_raises[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_create_dir[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_create_dir[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_copy_file[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_copy_file[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_move_file[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_move_file[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_delete_file[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_delete_file[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[S3FileSystem-None-None-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[S3FileSystem-None-64-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[S3FileSystem-gzip-None-compress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[S3FileSystem-gzip-256-compress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-None-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-64-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-None-compress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-256-compress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_file[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_file[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream_not_found[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_input_stream_not_found[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[S3FileSystem-None-None-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[S3FileSystem-None-64-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[S3FileSystem-gzip-None-decompress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[S3FileSystem-gzip-256-decompress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-None-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-64-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-None-decompress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-256-decompress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[S3FileSystem-None-None-identity-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[S3FileSystem-None-64-identity-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[S3FileSystem-gzip-None-compress-decompress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[S3FileSystem-gzip-256-compress-decompress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-None-identity-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-64-identity-identity] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-None-compress-decompress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-256-compress-decompress] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream_metadata[S3FileSystem] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_open_output_stream_metadata[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_filesystem_from_uri_s3 - urllib.error.HTTPError: HTTP Error 503: Service Unavailable
ERROR pyarrow/tests/test_fs.py::test_copy_files - urllib.error.HTTPError: HTTP Error 503: Service Unavailable

I cannot actually see the URL, but it all seems to be s3 related, and there's the following stack trace (I haven't verified if it applies to all failures, but probably):

==================================== ERRORS ====================================
_______________________ ERROR at setup of test_read_s3fs _______________________

s3_connection = ('localhost', 52375, 'arrow', 'apachearrow')
tmpdir_factory = TempdirFactory(_tmppath_factory=TempPathFactory(_given_basetemp=None, _trace=<pluggy._tracing.TagTracerSub object at 0x40002614ac20>, _basetemp=PosixPath('/tmp/pytest-of-conda/pytest-0'), _retention_count=3, _retention_policy='all'))

    @pytest.fixture(scope='session')
    def s3_server(s3_connection, tmpdir_factory):
        @retry(attempts=5, delay=0.1, backoff=2)
        def minio_server_health_check(address):
            resp = urllib.request.urlopen(f"http://{address}/minio/health/cluster")
            assert resp.getcode() == 200

        tmpdir = tmpdir_factory.getbasetemp()
        host, port, access_key, secret_key = s3_connection

        address = '{}:{}'.format(host, port)
        env = os.environ.copy()
        env.update({
            'MINIO_ACCESS_KEY': access_key,
            'MINIO_SECRET_KEY': secret_key
        })

        args = ['minio', '--compat', 'server', '--quiet', '--address',
                address, tmpdir]
        proc = None
        try:
            proc = subprocess.Popen(args, env=env)
        except OSError:
            pytest.skip('`minio` command cannot be located')
        else:
            # Wait for the server to startup before yielding
>           minio_server_health_check(address)

pyarrow/tests/conftest.py:219: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/tests/conftest.py:188: in wrapper
    raise last_exception
pyarrow/tests/conftest.py:180: in wrapper
    return func(*args, **kwargs)
pyarrow/tests/conftest.py:197: in minio_server_health_check
    resp = urllib.request.urlopen(f"http://{address}/minio/health/cluster")
../urllib/request.py:216: in urlopen
    return opener.open(url, data, timeout)
../urllib/request.py:525: in open
    response = meth(req, response)
../urllib/request.py:634: in http_response
    response = self.parent.error(
../urllib/request.py:563: in error
    return self._call_chain(*args)
../urllib/request.py:496: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <urllib.request.HTTPDefaultErrorHandler object at 0x40009408dc00>
req = <urllib.request.Request object at 0x40009408d8d0>
fp = <http.client.HTTPResponse object at 0x400024a367d0>, code = 503
msg = 'Service Unavailable'
hdrs = <http.client.HTTPMessage object at 0x400024a36b30>

    def http_error_default(self, req, fp, code, msg, hdrs):
>       raise HTTPError(req.full_url, code, msg, hdrs, fp)
E       urllib.error.HTTPError: HTTP Error 503: Service Unavailable

../urllib/request.py:643: HTTPError
Captured stdout setup -----------------------------

API: SYSTEM()
Time: 05:11:27 UTC 05/08/2024
Error: Unable to listen on `[::1]:52375`: listen tcp [::1]:52375: bind: cannot assign requested address (*errors.errorString)
       4: internal/logger/logger.go:260:logger.LogIf()
       3: cmd/server-main.go:779:cmd.serverMain.func8.1.1()
       2: internal/http/server.go:98:http.(*Server).Init()
       1: cmd/server-main.go:778:cmd.serverMain.func8.1()

Usually restarting once or twice solves the issue, but it's getting annoying through the sheer amount of times it's happening.

The function minio_server_health_check already has a built-in retry/back-off, so I think it's worth trying to increase that (at least for the feedstock here).

h-vetinari commented 6 months ago

So the CI in #118 passed both in PR and after the merge (5 individual runs each), which leads me to believe that this issue is solved by:

diff --git a/python/pyarrow/tests/conftest.py b/python/pyarrow/tests/conftest.py
index 57bc3c8fc..f9396b898 100644
--- a/python/pyarrow/tests/conftest.py
+++ b/python/pyarrow/tests/conftest.py
@@ -192,7 +192,7 @@ def retry(attempts=3, delay=1.0, max_delay=None, backoff=1):

 @pytest.fixture(scope='session')
 def s3_server(s3_connection, tmpdir_factory):
-    @retry(attempts=5, delay=0.1, backoff=2)
+    @retry(attempts=10, delay=1, backoff=2)
     def minio_server_health_check(address):
         resp = urllib.request.urlopen(f"http://{address}/minio/health/cluster")
         assert resp.getcode() == 20

Would you consider merging this upstream @raulcd @assignUser @kou? Given that s3_server is only initialized once per session, I think it's no issue to take a bit more time? It probably also works without the bump in attempts already, that should keep the maximum failure time for the fixture setup limited to 1+2+4+8+16=31sec.

assignUser commented 6 months ago

Yeah, looks good. I don't think there is an issue with the fix as it's only in a test and won't change anything for users :)

conda-forge / pyarrow-feedstock

Failing aarch tests due to s3_server fixture / minio_server_health_check #117