apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
5.87k stars 2.06k forks source link

Custom s3 endpoint: Unable to execute HTTP request: Remote host terminated the handshake #10490

Open samueljackson92 opened 3 weeks ago

samueljackson92 commented 3 weeks ago

Apache Iceberg version

1.5.2 (latest release)

Query engine

None

Please describe the bug 🐞

Hi,

I am experimenting with setting up Iceberg locally and I am trying to connect to a custom s3 endpoint to use as the backend for my project.

I am getting the following HTTP error when trying to create a new table:

Traceback (most recent call last):
  File "/Users/rt2549/miniconda3/envs/iceberg/lib/python3.11/site-packages/pyiceberg/catalog/rest.py", line 470, in create_table
    response.raise_for_status()
  File "/Users/rt2549/miniconda3/envs/iceberg/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Server Error for url: http://localhost:8181/v1/namespaces/default/tables

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/rt2549/projects/test-db/ingest.py", line 29, in <module>
    main()
  File "/Users/rt2549/projects/test-db/ingest.py", line 22, in main
    table = catalog.create_table(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rt2549/miniconda3/envs/iceberg/lib/python3.11/site-packages/pyiceberg/catalog/rest.py", line 472, in create_table
    self._handle_non_200_response(exc, {409: TableAlreadyExistsError})
  File "/Users/rt2549/miniconda3/envs/iceberg/lib/python3.11/site-packages/pyiceberg/catalog/rest.py", line 382, in _handle_non_200_response
    raise exception(response) from exc
pyiceberg.exceptions.ServerError: SdkClientException: Unable to execute HTTP request: Remote host terminated the handshake

My ingestion script looks like the following:

from pyiceberg.catalog import load_catalog
import pyarrow as pa
import pyarrow.parquet as pq

def main():
    s3_config = {
        "uri": "http://localhost:8181",
        "s3.endpoint": "https://s3.echo.stfc.ac.uk",
        "s3.access-key-id": "<my-key>",
        "s3.secret-access-key": "<my-secret>",
        "s3.region": "us-east-1",
        "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
    }
    catalog = load_catalog("default", **s3_config)
    df: pa.Table = pq.read_table("signals.parquet")
    df = df.drop_columns("description")

    catalog.create_namespace("default")

    table = catalog.create_table(
        "default.signals",
        schema=df.schema,
    )

if __name__ == "__main__":
    main()

My docker compose is copied from the iceberg tutorial:

version: "3"

services:
  rest:
    image: tabulario/iceberg-rest
    container_name: iceberg-rest
    networks:
      iceberg_net:
    ports:
      - 8181:8181
    environment:
      - AWS_ACCESS_KEY_ID=<access-key>
      - AWS_SECRET_ACCESS_KEY=<access-secret>
      - AWS_REGION=us-east-1
      - CATALOG_WAREHOUSE=s3://mast/test/warehouse/
      - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
      - CATALOG_S3_ENDPOINT=https://s3.echo.stfc.ac.uk
networks:
  iceberg_net:

I can create and ls files with my s3 credentials at that endpoint with other tools with no problem.

nastra commented 3 weeks ago

It complains because it can't access http://localhost:8181/v1/namespaces/default/tables. Make sure that the REST server is accessible via that URI

samueljackson92 commented 2 weeks ago

Hi @nastra thanks for your suggestion. I am not sure if this is the issue. If I navigate locally to that URI http://localhost:8181/v1/namespaces/default/tables I can see the following output:

{
    "identifiers": []
}

If I navigate to http://localhost:8181/v1/namespaces/default I can see this output:

{
  "namespace": [
    "default"
  ],
  "properties": {
    "location": "s3://mast/test/warehouse/default"
  }
}

So the REST server seems to be accessible?