cloudfoundry / haproxy-boshrelease

A BOSH release for haproxy (based on cf-release's haproxy job)
Apache License 2.0
38 stars 81 forks source link

Improve naming of backends when backend_servers_local declared: node and backup_node #582

Open gberche-orange opened 11 months ago

gberche-orange commented 11 months ago

Thanks for sharing this great work, here is a small suggestion as we're starting leveraging the prometheus metrics in hand with backend_servers_local.

Expected behavior

As an operator using the backend_servers_local to favor some local nodes, In order easily distinguish haproxy backends that are active backends vs the ones that are backup backends, I need the backend server name to surface differences between primary and backup nodes

Observed behavior

https://github.com/cloudfoundry/haproxy-boshrelease/blob/26ca38b7977adcc18e25b84046ae8e8ffd62142a/jobs/haproxy/templates/haproxy.config.erb#L789-L791

Given the following config

then the following configuration is generated

backend tcp-server
    mode tcp

    server node0 192.168.60.49:12379 check port 12379 inter 1000    
    server node1 192.168.64.48:12379 check port 12379 inter 1000  backup 
    server node2 192.168.64.49:12379 check port 12379 inter 1000  backup

As a result of this default sequential naming of nodes (node0, node1, node2), the haproxy prometheus metrics are harder to properly interpret because the role of each node isn't surfaced in its naming

curl -s http://127.0.0.1:9000/metrics | grep haproxy_server_bytes_in_total
# HELP haproxy_server_bytes_in_total Total number of request bytes since process started
# TYPE haproxy_server_bytes_in_total counter
haproxy_server_bytes_in_total{proxy="tcp-server",server="node0"} 8381746800
haproxy_server_bytes_in_total{proxy="tcp-server",server="node1"} 0
haproxy_server_bytes_in_total{proxy="tcp-server",server="node2"} 0

image

Suggested enhancement

Backend included into backend_servers_local use a distinct naming backup_node<backup_nodes_index> from primary backends named node<primary_nodes_index>

maxmoehl commented 11 months ago

Thanks for bringing this up!

Do you have a preference whether the index should be continuous or separate between primary and backup nodes?

gberche-orange commented 11 months ago

Thanks @maxmoehl for your prompt feedback !

After discussion within our team, we finally have a preference for a single continuous index shared between primary and backup nodes. This might also be easier to implement and maintain.

in the example https://github.com/cloudfoundry/haproxy-boshrelease/issues/582#issue-2035733436 this would result into

node0
backup_node1
backup_node2
maxmoehl commented 11 months ago

This might also be easier to implement and maintain.

Which is a big plus given the current state of the template file 😄

The one issue I see is that this could be considered a breaking change as the metrics and log output (depending on the configuration) will change. One way around that would be a feature flag, but I really don't want yet another configuration property.

Major Version (X.y.z) -- incremented if any backwards incompatible changes are introduced to the public API

Our definition of a major version doesn't really help here as we never got around to specifying what is part of the "public API". I'll try to get some alignment on this.

gberche-orange commented 11 months ago

+1 for creating a new major version as to ensure clients read the release note and notice potential change in metrics and log output if they use backend_servers_local property

I did not find traces of usage of the property within the two main related github orgs

maxmoehl commented 11 months ago

When cutting a major version we just have to figure out which version will become our maintenance version. We usually wan't to support at least two minor versions of HAProxy but this would mean we keep supporting v12 and v14 forcing people on v13 to upgrade. But I don't see this as road-blocker.

I'd say you can start working on this, if you like, and we figure out the versioning details once the change is in :)

maxmoehl commented 8 months ago

@gberche-orange is there any update on this?

gberche-orange commented 8 months ago

Hi @maxmoehl, sorry for late response. I won't be able to prioritize the contribution discussed in this issue in the near future. If priority allows in my team in the future, then I'll return to this issue with comment and hopefully a PR. Sorry I can't help more.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs within 15 days.