Include metrics for load balance servers

sboschman commented 3 years ago

Feature request to expose load balance metrics (virtual servers and real servers).

api endpoint: /firewall/load-balance

Expected metrics (proposal):

Per-VirtualServer and VDOM:

fortigate_lb_virtualserver_info

Per-RealServer for each VirtualServer and VDOM:

fortigate_lb_realserver_info

fortigate_lb_realserver_mode

fortigate_lb_realserver_status

fortigate_lb_realserver_active_sessions

fortigate_lb_realserver_bytes_processed

I am working on adding these metrics and opening a PR for it.

Just one question about how to handle the mandatory 'count' param for this endpoint.

count *required integer(query) | Maximum number of entries to return.

Is it acceptable to just use a fixed value, e.g. 999? (is this enough? people with more than 999 virtual servers configured?, make it 9999?) Or should it be a complex batching solution, using the start and count params, and quering in batches of e.g. 250.

/firewall/load-balance?start=0&count=250
/firewall/load-balance?start=250&count=250
etc

Any ideas on your part are welcome.

bluecmd commented 3 years ago

Hi!

Sounds nice, looking forward to the PR :-). Definitely think this is something we should have. If you haven't already, take a look at https://prometheus.io/docs/practices/naming/ - we try to follow it as close as we can.

You can browse through old PRs and see that generally the feedback we give is on metric naming and what should/should not be labels etc. So if you spend some time thinking on what makes sense and maybe even ask in #prometheus IRC channel for some feedback, the review will be much smoother. Thinking of it, @secustor we should probably add that to some sort of CONTRIBUTOR.md or something to help people out where to discuss metric naming.

Or should it be a complex batching solution, using the start and count params, and quering in batches of e.g. 250.

I would prefer if the API has a pagination, as it seems to, that we use it. If you think it would be too hard to add, then at the very least output a warning that too many loadbalancers were found and not all will be scraped, as well as file an enhancement issue here on Github so we can follow up on that limitation in the future.

secustor commented 3 years ago

A Contributor.md would be definitely useful.

Regarding pagination this is heavenly dependent how big the resulting objects are.
This is more of a gut feeling then knowledge, but I think a page should not have more than 10k key value pairs. In case we run we run into problems later on this should be easy to optimize later Thinking about the implementation, it would be probably something like this:


pageSize := 100
entryIndex := 0 
combinedResults := []Results{}

do {
    response := queryAPI(entryIndex,pageSize);
    append(combinedResults, response.Results ... ) 
    entryIndex += pageSize
} while (pageSize == len(response.results));

// do something with the results

sboschman commented 3 years ago

Maybe I am overthinking this, but isn't part of the reason to use pagination to limit memory usage on the client side? Stitching all pages together before processing feels like an out of memory waiting to happen.

So, I was looking into adding a BatchGet/GetPages function to the FortiHTTP clent. This BatchGet/GetPages function takes a callback function as argument. The pagination logic would be the responsibility of the fortiTokenClient. Tthe individual probeXXX functions can focus on creating metrics from the api results, not really caring that the fortiTokenClient uses multiple calls to get all the results.

To test the probes a 'fakeClient' exists. This means two BatchGet/GetPages implementations, with the fortiTokenClient implementation not hit during the probe tests.

After some digging around I found httpmock (https://github.com/jarcoal/httpmock). Using httpmock we can eliminate the fakeClient and just use the fortiTokenClient in all probeXXX tests. Which would make adding a BatchGet/GetPages function a lot easier I think.

Just some code thrown together to use httpmock, without any attempt to create one/two lines of reusable setup code:

func TestSystemStatus(t *testing.T) {
    httpmock.Activate()
    defer httpmock.DeactivateAndReset()

    httpmock.RegisterResponder("GET", "https://fortigate.local/api/v2/monitor/system/status",
        httpmock.NewBytesResponder(200, prepare("testdata/status.jsonnet")))

    tgt, _ := url.Parse("https://fortigate.local")
    authMap[tgt.String()] = Auth{Token: "test"}

    c, err := newFortiClient(context.Background(), *tgt, &http.Client{})
    if err != nil {
        t.Fatalf("Failed to create client. %v", err)
    }
    // c := newFakeClient()
    // c.prepare("api/v2/monitor/system/status", "testdata/status.jsonnet")

    r := prometheus.NewPedanticRegistry()
    if !testProbe(probeSystemStatus, c, r) {
        t.Errorf("probeSystemStatus() returned non-success")
    }

       ...

Which raises the following questions:

whether or not to use a batch/page function on the client with a callback instead of implementing pagination in multiple probeXXX functions
should I, before adding batch/page functionality, replace the fakeClient with httpmock

bluecmd commented 3 years ago

I am all for using httpmock if it makes things easier, but I would have to look closely on httpmock to see if it does things like arbitrary parameter ordering.

I won't stop you from proposing a grander pagination structure, but wouldn't it be easier to get the metric in first - and then potentially refactor the pagination? Having both changes in the same PR seems like it would be a massive PR, and doing it the other way would leave yourself (and others) hanging on us agreeing on an abstraction.

Creating a testdata/xxx-page1.jsonnet + testdata/xxx-page2.jsonnet would probably suffice for now, is my gut feeling.

EDIT: Is pagination a thing we will need in existing metrics? I wonder if we have missed that in the past, or if this is a new thing for only LB

sboschman commented 3 years ago

Is pagination a thing we will need in existing metrics? I wonder if we have missed that in the past, or if this is a new thing for only LB

probeFirewallPolicies: api/v2/cmdb/firewall/policy and probeHAStatistics: api/v2/cmdb/system/ha

These endpoints also have a start and count query param, but both are optional; can't find what the default for count is for these specific endpoints... Somewhere else in the api docs for firewall/internet-service-details they state:

count: Maximum number of entries to return. Valid range is [20, 1000]; if a value is specified out of that range, it will be rounded up or down. Default value is 1000.

Wouldn't be surprised if that default of 1000 items also holds true for firewall/policy endpoint, guess nobody using this exporter has a fortigate unit with 1000+ fw policy rules defined.

secustor commented 3 years ago

Reduction of memory usage is definitely also an use case. But when I had to implement pagination, it has been more because the transfer data has become so big that I ran into timeouts. Further it reduces the amount of data which is lost and has to be re fetched in case of a network hiccup.

I agree with @bluecmd that we should split this up and track the discussion regarding the mock framework and a refactoring of the FortiHTTP class in a separate issue.

Regarding mock frameworks I throw gock into the ring. I have worked with its inspiration ( nock ) but not with it itself.

sboschman commented 3 years ago

Already switched to gock as httpmock seems limited in query params matching. It uses an exact match and a sorted match. Unfortunately vdom=* get its value uri encoded in the sorting process and fails to match afterwards... The gock api requires a bit more refactoring of the current code, so hacking around a bit with it at the moment

sboschman commented 3 years ago

one possible way to use gock for testing can be seen in #53

bluecmd / fortigate_exporter

Include metrics for load balance servers #47