Open mjvankampen opened 1 year ago
Even with add_unique_identifier set to true sometimes the error still pops up
I think I found why it still sometimes happens. This coincides with a scale-down of an AppEngine instance. This kind of makes sense if metrics are flushed on shutdown.
Is your app being run in multiple processes e.g. gunicorn pre-fork model?
If not, then the shutdown final flush is the likely culprit. The shortest allowed interval to publish metrics to Cloud Monitoring is 5s. If you're app flushes metrics before shutdown and the previous export ran within the last 5s you can see this error. Do you see any issues in your dashboards/metrics?
Hey, I hope it's okay to latch onto this question. What would the answer be if the app was run in gunicorn? I'm currently testing exporting metrics to Cloud Monitoring and during local development (with a single-process flask development server), everything works fine. But when I deploy to Cloud Run (gunicorn with multiple workers), I frequently get One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric
.
I'm setting up the reader/exporter with add_unique_identifier=True
, like this:
PeriodicExportingMetricReader(
CloudMonitoringMetricsExporter(add_unique_identifier=True),
export_interval_millis=5000,
)
Any tips on how to avoid this? Thanks a lot!
Hi! Thanks for the lib. I see a similar issue:
One or more TimeSeries could not be written: Points must be written in order. One or more of the points specified had an older start time than the most recent point
This is a uvicorn app on Cloud Run with 2 containers, running 2 workers each.
A minimal repro of the code:
exporter = CloudMonitoringMetricsExporter(project_id=gcp_project_id)
reader = PeriodicExportingMetricReader(exporter)
detector = GoogleCloudResourceDetector(raise_on_error=True)
provider = MeterProvider(metric_readers=[reader], resource=detector.detect())
meter = provider.get_meter(name="api")
# The meter object is injected into multiple classes, each which create their own instruments, e.g.:
latency = meter.create_histogram(name="api.latency", unit="ms")
I'd really like to fix this in an automatic way but not sure the best way to go about it. Are you able to defer the initialization of OpenTelemetry MeterProvider until after the workers start (post-fork)?
We run on Cloud Run and also get the error Points must be written in order. One or more of the points specified had an older start time than the most recent point.
.
This only gets fixed by using CloudMonitoringMetricsExporter(add_unique_identifier=True)
. This seems weird, since our resource should be set up with unique labels, since we have 1) the Cloud Run instance id and 2) the process id:
resource = get_aggregated_resources(
[GoogleCloudResourceDetector(raise_on_error=True)],
initial_resource=Resource.create(attributes={
SERVICE_INSTANCE_ID: f"worker-{os.getpid()}",
}),
)
gcp_monitoring_exporter = CloudMonitoringMetricsExporter(
add_unique_identifier=True
)
metric_reader = PeriodicExportingMetricReader(gcp_monitoring_exporter)
meter_provider = MeterProvider(
metric_readers=[metric_reader],
resource=resource,
)
metrics.set_meter_provider(meter_provider)
Is it expected that you always need to set add_unique_identifier=True
or how do we know when this is really required (and does it do any harm?)?
I've been setting up OTel metrics with Google Cloud Monitoring for our Django app running on AppEngine.
I used code that looked like this
I thought that by using a unique instance ID I would get around the
One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric
issue. But it seems I really need to useadd_unique_identifier=True
. While I understand this for multithreaded applications or multiple exporters for the same resource, I don't understand it if the resource is already unique.