SUSE / hanadb_exporter

Prometheus exporter for SAP HANA databases
Apache License 2.0
46 stars 27 forks source link

Handling HANA scale-out/failover usecases #94

Closed elturkym closed 2 years ago

elturkym commented 3 years ago

SCALE-OUT/FAILOVER

The exporter installation will fail on all nodes, if it is not able to connect to master node.

Examples:

curl response:

[ec2-user@imdbworker02 ~]$ curl localhost:9668
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 215.0
python_gc_objects_collected_total{generation="1"} 326.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 91.0
python_gc_collections_total{generation="1"} 8.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="8",patchlevel="4",version="3.8.4"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 4.07556096e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 3.3619968e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.62826749985e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.22999999999999998
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 8.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024.0
elturkym commented 3 years ago

Hi @arbulu89,

Thanks,

Mohamed

elturkym commented 3 years ago

I have moved the secrets manager changes to this new pull request https://github.com/SUSE/hanadb_exporter/pull/97 as recommended.

I will keep this PR for scale-outs handling

elturkym commented 3 years ago

Hi @arbulu89,

I have updated this PR with the new modification for scale-out handling, I used the same PR to keep the conversation history.

This approach depends on starting the export with master host:

Unit-test and documentation are remaining, but I will add them while addressing any comments from you about these changes.

arbulu89 commented 3 years ago

Hi @elturkym , I'm back at work. I will have a look on this on these 1st days of the week and get you back with my feedback.

elturkym commented 3 years ago

Hi @elturkym , Many things commented below. I think we need to rethink many things.

  • I think many parts of the code must be replaced to the database manager, which should handle scale out connections

I have replied back to all the comments and we can discuss offline

  • We should most probably return some metric for standby nodes, otherwise they just don't do anything, and I don't know why we should collect their information

it returns python metrics python_info{implementation="CPython",major="3",minor="8",patchlevel="4",version="3.8.4"} 1.0 as mentioned in the description

  • I don't really like the idea to connect to the master node from all the active nodes, if this is the case, don't they return all the same values? Is this something logical? (or am I missing something?). If the data is duplicated, maybe we should just return a metric saying that their role and only return data from the master (the first thing that came to my mind)

I am expecting to have only active master node at the time, the data should be duplicated. currently workers will return some python metrics as mentioned in the description, let it me if you think we should add specific metric to tell it is standby node I am not sure what is the cost of that.

lee-martin commented 2 years ago

@arbulu89 and @stefanotorresi Is there any update on where we are with this? We had planned to ship this and https://github.com/SUSE/hanadb_exporter/pull/97 , see https://jira.suse.com/browse/SLE-20632 .

elturkym commented 2 years ago

Hi All,

I am going to close this PR, since the priorities have changed in our end, I won't be able to continue working in this feature during this year.

Thanks so much for helping me in this PR and https://github.com/SUSE/hanadb_exporter/pull/97 as well

Looking forward to working with you again.

Best regards,

Mohamed Elturky