cosmo0920 / fluent-bit-go-loki

[Deprecated] The predessor of fluent-bit output plugin for Loki. https://github.com/grafana/loki
Apache License 2.0
40 stars 6 forks source link

Entry out of order for stream #16

Closed dippynark closed 4 years ago

dippynark commented 5 years ago

Problem

Fluent-bit appears to be sending log messages out of order and Loki is rejecting them - finding it hard to narrow down the source of the issue. There are lots of rejections when fluent-bit starts up and they continue but settle down quickly. I built the fluent-bit-go-loki library from master which could have an impact - see commit hash in td-agent-bit output below.

Maybe related:

Steps to replicate

Fluent-bit conf

[SERVICE]
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020

[INPUT]
    Name  systemd
    Tag   host.*

[OUTPUT]
    Name   loki
    Match  *
    Url    http://loki.monitoring.svc.cluster.local:3100/api/prom/push

Loki conf

auth_enabled: false
chunk_store_config:
  max_look_back_period: 0
ingester:
  chunk_block_size: 262144
  chunk_idle_period: 15m
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
schema_config:
  configs:
  - from: "2018-04-15"
    index:
      period: 168h
      prefix: index_
    object_store: filesystem
    schema: v9
    store: boltdb
server:
  http_listen_port: 3100
storage_config:
  boltdb:
    directory: /data/loki/index
  filesystem:
    directory: /data/loki/chunks
table_manager:
  retention_deletes_enabled: true
  retention_period: 336h

Expected Behavior or What you need to ask

No out of order messages - is it possible to see which logs are being rejected? I change log level to trace and there was no obvious extra relevant info.

Using Fluentd and loki plugin version

cosmo0920 commented 5 years ago

is it possible to see which logs are being rejected?

No, it isn't. Because Go loki client library makes chunks which are grouped events and it doesn't provide failure record confirmation interface or function.

dippynark commented 5 years ago

@cosmo0920 I think the error can be explained by this: https://github.com/grafana/loki/issues/168#issuecomment-465665648

I am running multiple instances of fluent-bit but they are given the same labels as seen in the logs above - I guess this plugin should add some unique source label as per the comment, although an instance label seems more sensible IMO to be closer to what Prometheus is doing.

dippynark commented 5 years ago

Just seen that you already support extra labels - I gave my instances unique ones and there are still the same errors on startup (about 10) but after that there are no more (so far at least)

cosmo0920 commented 5 years ago

Hmm..., it seems that this issue is caused by Loki itself.... I've also found that the similar issue: https://github.com/grafana/loki/issues/898

chancez commented 4 years ago

Potentially related: https://github.com/fluent/fluent-bit/issues/1746

cosmo0920 commented 4 years ago

This repository is deprecated. Use grafana/loki's fluent-bit plugin instead. Closing.