Could not communicate to OpenSearch, resetting connection and trying again. [404]

kentan88 commented 9 months ago

[ x] read the contribution guideline

Steps to replicate

Provide example config and message Dockerfile

# Use the fluentd base image
FROM fluent/fluentd:v1.15-debian-1

USER root

# Install necessary dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    curl \
    && rm -rf /var/lib/apt/lists/*

RUN gem install faraday-net_http multi_json aws-eventstream faraday aws-sigv4 opensearch-ruby faraday_middleware-aws-sigv4 fluent-plugin-opensearch excon faraday-excon jmespath aws-partitions aws-sdk-core fluent-plugin-opensearch

# Switch back to fluent user
USER fluent

# Copy the configuration file to the Fluentd configuration directory
COPY ./config/fluent-opensearch.conf /fluentd/etc/fluent.conf

# Expose port for Fluentd
EXPOSE 24224

# Run Fluentd with the configuration file
# (often located at /etc/fluent/fluent.conf or /etc/td-agent/td-agent.conf). Add an output section with the OpenSearch configuration.
CMD ["fluentd", "-c", "/fluentd/etc/fluent.conf"]

fluent.conf

<match es.**>
  @type opensearch
  logstash_format true
  include_tag_key true
  flush_interval 1s

  <endpoint>
    url https://xxxxx.ap-southeast-1.aoss.amazonaws.com
    region ap-southeast-1
    access_key_id XXXXXXXXXXXX
    secret_access_key XXXXXXXXXXXX
    aws_service_name aoss
  </endpoint>
</match>

Expected Behavior or What you need to ask

I'm running a local Docker which uses fluent/fluentd:v1.15-debian-1 as the base image. When I ran the container, i'm getting the following message:

2024-02-20 07:52:23 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-02-20 07:52:23 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-02-20 07:52:23 +0000 [info]: gem 'fluentd' version '1.15.3'
2024-02-20 07:52:23 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.4'
2024-02-20 07:52:23 +0000 [info]: using configuration file: <ROOT>
  <match es.**>
    @type opensearch
    <endpoint>
      url https://XXXXXXXXXXXX.ap-southeast-1.aoss.amazonaws.com/
      region "ap-southeast-1"
      access_key_id "XXXXXXXXXXXX"
      secret_access_key xxxxxx
      aws_service_name aoss
    </endpoint>
  </match>
</ROOT>
2024-02-20 07:52:23 +0000 [info]: starting fluentd-1.15.3 pid=7 ruby="3.1.3"
2024-02-20 07:52:23 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2024-02-20 07:52:23 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-02-20 07:52:24 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-02-20 07:52:24 +0000 [info]: adding match pattern="es.**" type="opensearch"
2024-02-20 07:52:26 +0000 [warn]: #0 Could not communicate to OpenSearch, resetting connection and trying again. [404]
2024-02-20 07:52:26 +0000 [warn]: #0 Remaining retry: 14. Retry to communicate after 2 second(s).
2024-02-20 07:52:30 +0000 [warn]: #0 Could not communicate to OpenSearch, resetting connection and trying again. [404]
2024-02-20 07:52:30 +0000 [warn]: #0 Remaining retry: 13. Retry to communicate after 4 second(s).
2024-02-20 07:52:38 +0000 [warn]: #0 Could not communicate to OpenSearch, resetting connection and trying again. [404]
2024-02-20 07:52:38 +0000 [warn]: #0 Remaining retry: 12. Retry to communicate after 8 second(s).

I can confirm that the AWS credentials and AWS OpenSearch Serverless endpoint are correct and also reachable as I was able to send data using a ruby OpenSearch client.

Any help would be much appreciated. ...

Using Fluentd and OpenSearch plugin versions

OS version fluent/fluentd:v1.15-debian-1
Docker
Fluentd v1.15.3
OpenSearch plugin version 1.1.4

mhkarimi1383 commented 7 months ago

Having the same problem with OpenSearch K8s operator and I have to restart fluentd daemon set to fix the problem every time.

mhkarimi1383 commented 7 months ago

@kentan88

Have you tried setting reload_on_failure to true? I saw this option in README, I will test it and I think this will resolve the issue :)

mhkarimi1383 commented 7 months ago

setting reload_on_failure to true did not fixed the problem

mhkarimi1383 commented 6 months ago


livenessProbe:
  httpGet: null
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 5
  exec:
    command:
      - bash
      - -c
      - >
        set -ex;
        curl -s http://localhost:24231/metrics
        | grep -E "fluentd_output_status_retry_wait|fluentd_output_status_num_errors|fluentd_output_status_retry_count" 
        | grep -Ev "# HELP|# TYPE"
        | grep -v "0.0"
        | wc -l | grep 0

I have added these values into the daemonset helm chart it should restart containers when retry or error happens

(Do not forget to install curl in your docker image)

fluent / fluent-plugin-opensearch