gnocchixyz / gnocchi

Timeseries database
Apache License 2.0
302 stars 85 forks source link

Swift storage backend multi regions #1067

Open benohara opened 4 years ago

benohara commented 4 years ago

1034 # Which version of Gnocchi are you using

4.3.2 (RDO stein release)

How to reproduce your problem

Openstack deployed with multiple regions, shared keystone database, swift ring spans all regions.

Configured swift 'storage' with swift_container_prefix = $location_gnocchi Documentation suggests that incoming also uses swift_container_prefix, code looks to suggest otherwise.

What is the result that you get

Ceilometer in each region posts the metrics to gnocchi-api, i believe it then gets stored in swift in incoming128-X Gnocchi-metricd running in each region then looks to start processing the incoming data (from all regions) before storing the metrics as $location_gnocchi.$metricid

Gnocchi-metricd in each region randomly logs that the incoming file is missing (404) assuming as another region has picked it up and processed it already?

Metrics then appear to contain a mix of data for resources from all regions.

What is result that you expected

swift_container_prefix is prepended to to the incoming128-X container and gnocchi-metricd processes results from the same container thereby processing only its own region.

Quick look at the code suggests prepending sack_name with swift_container_prefix would fix it?

Also the gnocchi-config should be prefixed, so you can set different sack number in each region?

Could also just create a project for each region and store the gnocchi in there and forget the prefix?

unexceptable commented 4 years ago

Another option we can do along with this is include the ability to limit Gnocchi in a given region to use containers with a local no replication policy to limit the need for swift to copy samples across all regions.

For long term storage multi-region containers are likely a good idea, but likely not for incoming data during processing, and if the given cloud has such policies in place that's not a bad idea and may avoid some issues and extra load on Swift.

tobias-urdin commented 2 years ago

I think maybe these metricd daemons don't share coordination and thus could potentially be racing to handle incoming. Not sure if this is a bug or, please provide more information.