GoogleCloudPlatform / opentelemetry-operations-go

Apache License 2.0
130 stars 100 forks source link

WAL does not support multiple exporters #654

Open cpheps opened 1 year ago

cpheps commented 1 year ago

When multiple googlecloud exporters are configured they seem to compete for access to the WAL. This causes several error logs on the WAL not being found or being deleted.

These are some of the error logs observed:

{
  "level": "error",
  "ts": "2023-06-28T09:12:34.996-0400",
  "caller": "collector@v0.39.2/metrics.go:605",
  "msg": "error reading WAL and exporting: remove ./storage/gcp_metrics_wal/00000000000000000176: no such file or directory",
  "kind": "exporter",
  "data_type": "metrics",
  "name": "googlecloud/gcp",
  "stacktrace": "github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector.(*MetricsExporter).runWALReadAndExportLoop\n\t/Users/cphelps/go/pkg/mod/github.com/!google!cloud!platform/opentelemetry-operations-go/exporter/collector@v0.39.2/metrics.go:605"
}
{
  "level": "error",
  "ts": "2023-06-28T09:12:35.002-0400",
  "caller": "collector@v0.39.2/metrics.go:610",
  "msg": "error watching WAL and exporting: WAL file deleted",
  "kind": "exporter",
  "data_type": "metrics",
  "name": "googlecloud/gcp",
  "stacktrace": "github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/collector.(*MetricsExporter).runWALReadAndExportLoop\n\t/Users/cphelps/go/pkg/mod/github.com/!google!cloud!platform/opentelemetry-operations-go/exporter/collector@v0.39.2/metrics.go:610"
}

Here is mocked up example config to reproduce:

receivers:
    hostmetrics:
        collection_interval: 60s
        scrapers:
            filesystem: null
            load: null
            memory: null
            network: null
            paging: null
processors:
    batch:
exporters:
    googlecloud/proj1:
        metric:
            experimental_wal_config:
                directory: ./storage
                max_backoff: 60m
        project: proj1
    googlecloud/proj2:
        metric:
            experimental_wal_config:
                directory: ./storage
                max_backoff: 60m
        project: proj2
service:
    pipelines:
        metrics:
            receivers:
                - hostmetrics
            processors:
                - batch
            exporters:
                - googlecloud/proj1
                - googlecloud/proj2
damemi commented 1 year ago

This makes sense with the current implementation. The different exporters can't read from the same WAL because there's nothing mapping which entries go to which exporter.

You should be able to make this work using different directories like:

exporters:
    googlecloud/proj1:
        metric:
            experimental_wal_config:
                directory: ./storage/proj1
                max_backoff: 60m
        project: proj1
    googlecloud/proj2:
        metric:
            experimental_wal_config:
                directory: ./storage/proj2
                max_backoff: 60m
        project: proj2

We might be able to do this automatically by parsing the name of the exporter and using that as the file name under the directory.

cpheps commented 1 year ago

@damemi that makes sense. It wasn't clear at first if they could share the same directly like the file_storage extension does by creating a subdirectory with the component ID.

I think it's fine as is but it would be nice if did automatically create something for you.