Segmentation fault on ECS Fargate when reading from S3 #33771

Open bryzgaloff opened 2 years ago

bryzgaloff commented 2 years ago

Describe what's wrong

I am trying to execute select * from s3(…) and ClickHouse dies with the following output:

FROM s3('…', 'TSV', 'root String')

Query id: 86972283-b218-493b-a56b-c1125f083ce6

[85b9a88609d0454f8fb03fa59362654d-2580379071] 2022.01.19 07:46:56.066552 [ 251 ] <Fatal> BaseDaemon: ########################################
[85b9a88609d0454f8fb03fa59362654d-2580379071] 2022.01.19 07:46:56.066602 [ 251 ] <Fatal> BaseDaemon: (version (official build), build id: FA4A7F489F3FF6E3) (from thread 97) (query_id: 86972283-b218-493b-a56b-c1125f083ce6) Received signal Segmentation fault (11)
[85b9a88609d0454f8fb03fa59362654d-2580379071] 2022.01.19 07:46:56.066622 [ 251 ] <Fatal> BaseDaemon: Address: 0x7f2172f7a5c0 Access: read. Attempted access has violated the permissions assigned to the memory area.
[85b9a88609d0454f8fb03fa59362654d-2580379071] 2022.01.19 07:46:56.066636 [ 251 ] <Fatal> BaseDaemon: Stack trace: 0x1257420f 0x7f2207ded3c0
[85b9a88609d0454f8fb03fa59362654d-2580379071] 2022.01.19 07:46:56.066666 [ 251 ] <Fatal> BaseDaemon: 0. ? @ 0x1257420f in /usr/bin/clickhouse
[85b9a88609d0454f8fb03fa59362654d-2580379071] 2022.01.19 07:46:56.066700 [ 251 ] <Fatal> BaseDaemon: 1. ? @ 0x7f2207ded3c0 in ?
[85b9a88609d0454f8fb03fa59362654d-2580379071] 2022.01.19 07:46:56.194581 [ 251 ] <Fatal> BaseDaemon: Calculated checksum of the binary: 5BEBF5792A40F7E345921EDA3698245B. There is no information about the reference checksum.
Exception on client:
Code: 32. DB::Exception: Attempt to read after eof: while receiving packet from (ATTEMPT_TO_READ_AFTER_EOF)

Connecting to as user default.
Code: 210. DB::NetException: Connection refused ( (NETWORK_ERROR)

Does it reproduce on recent release?

I use Docker image yandex/clickhouse-server:

How to reproduce

I have defined the following Dockerfile:

FROM yandex/clickhouse-server:
RUN echo '<clickhouse><s3><s3><endpoint></endpoint><use_environment_credentials>true</use_environment_credentials></s3></s3></clickhouse>' \
        > /etc/clickhouse-server/config.d/s3.xml

All it does is sets use_environment_credentials=true for my bucket in eu-central-1 AWS region. This is required since I am running ClickHouse on ECS Fargate (in a container) with an IAM role attached to a task.

Here is service definition in Terraform:

locals {
  clickhouse-server-http-port   = 8123
  clickhouse-server-native-port = 9000

resource "aws_ecs_task_definition" "clickhouse-server" {
  family                   = "clickhouse-server"
  cpu                      = 2048
  memory                   = 8192
  network_mode             = "awsvpc"
  execution_role_arn       = aws_iam_role.ecs-exec.arn
  requires_compatibilities = ["FARGATE"]
  task_role_arn            = aws_iam_role.clickhouse-access-s3.arn  # HERE I ATTACH A ROLE TO USE TO ACCESS S3
  container_definitions    = jsonencode([
      name             = "clickhouse-server"
      image            = # my custom image
      portMappings     = [
        { containerPort = local.clickhouse-server-http-port },
        { containerPort = local.clickhouse-server-native-port },
      logConfiguration = {
        logDriver = "awslogs"
        options   = {
          awslogs-group         =
          awslogs-region        =
          awslogs-stream-prefix = "clickhouse-server"

resource "aws_iam_role" "clickhouse-access-s3" {
  name               = "clickhouse-access-s3"
  assume_role_policy = data.aws_iam_policy_document.assume-ecs-execution-role.json
  inline_policy {
    name   = "clickhouse-access-s3"
    policy = data.aws_iam_policy_document.clickhouse-access-s3.json
data "aws_iam_policy_document" "clickhouse-access-s3" {
  statement {
    effect    = "Allow"
    actions   = ["s3:*"]
    resources = []  # proper S3 permissions here

resource "aws_ecs_cluster" "clickhouse-server" {
  name = "clickhouse-server"

resource "aws_ecs_service" "clickhouse-server" {
  name            = "clickhouse-server"
  launch_type     = "FARGATE"
  desired_count   = 1
  task_definition = aws_ecs_task_definition.clickhouse-server.arn
  cluster         = aws_ecs_cluster.clickhouse-server.arn
  # proper network_configuration {…}

resource "aws_cloudwatch_log_group" "clickhouse-server" {
  name = "clickhouse-server"

When using s3 table function without config file (running container from pure yandex/clickhouse-server: image without my custom Dockerfile), I was able to read this file from S3 by providing explicit AWS credentials. Now, when configuration is added, the operation fails even with AWS credentials explicitly provided. The error is the same: see traceback above, ClickHouse container dies.

Solution also works when container is run using docker run simply on an EC2 instance (without Fargate service). No error also appears if the service is run without task role attached (task_role_arn=null above).

Expected behavior

I expect s3 table function to use $AWS_CONTAINER_CREDENTIALS_RELATIVE_URI when use_environment_credentials enabled, and looks like codebase supports it:


S3 endpoint settings documentation.

tavplubix commented 2 years ago

Looks related to #29492

iamthen0ise commented 2 years ago

i'd subscribe to the bug, because my problem still not solved

dlahn commented 1 year ago

@bryzgaloff Did you ever solve this?

bryzgaloff commented 1 year ago

Hi @dlahn I have decided to use explicit AWS credentials in s3(…) call.

alexey-milovidov commented 9 months ago

All it does is sets use_environment_credentials=true for my bucket

By the way, this is no longer required, as we use environment credentials by default.

jdaripineni commented 6 months ago

I'm experiencing same issue with S3 connectivity with AWS Credentials Provider (when using AWS_CONTAINER_CREDENTIALS_RELATIVE_URI OR AWS_CONTAINER_CREDENTIALS_ABSOLUTE_URI.

These are the s3 and storage configs:

      s3.xml: |-
      storage.xml: |-
Sharu95 commented 2 months ago

Experiencing the same issue on this one, anyone got any pinpoints as to where the issue might be? Happy to contribute, and appreciate any input on where I might look to start with 😄

Sharu95 commented 2 months ago

Probably also related to #43820?

jakedaleweb commented 1 month ago

Also seeing this issue trying to set-up s3 as the data store for an ECS cluster running clickhouse.

alexey-milovidov commented 1 month ago

Trivially reproducible in CloudShell:

alexey-milovidov commented 1 month ago

Reproducible without ECS: AWS_CONTAINER_CREDENTIALS_FULL_URI=http://localhost:1338/latest/meta-data/container/security-credentials ch --query "SELECT * FROM s3('s3://clickhouse-public-datasets/tranco/*')"

alexey-milovidov commented 1 month ago

The bug was introduced here:

jakedaleweb commented 1 month ago

Thanks @alexey-milovidov 🙏