HewlettPackard / lustre_exporter

Prometheus exporter for use with the Lustre parallel filesystem
Apache License 2.0
34 stars 51 forks source link

Coalesce some statistics into a single metric #87

Closed joehandzik closed 7 years ago

joehandzik commented 7 years ago

Signed-Off-By: Joe Handzik joseph.t.handzik@hpe.com

Fixes #64

joehandzik commented 7 years ago

I fully expect this PR to need some major cleanup before merging. It's ugly and hacky, but it does work. So, thoughts on the general concept and the result that it comes up with would be appreciated.

roclark commented 7 years ago

I don't see anything wrong or bad about this PR, though I think we can certainly make it more efficient. As a quick brain exercise, I created this semi-pseudo function using a lot of your code in the getJobStatsByOperation function:

func getJobStatsOperation(job string, jobID string) (metricList []lustreJobsMetric, err error) {
    opList := []string{"open", "close", "mknod", "link", "unlink", "mkdir", "rmdir", "rename", "getattr", "setattr", "getxattr", "setxattr", "statfs", "sync", "samedir_rename", "crossdir_rename", "punch", "destroy", "create", "get_info", "set_info", "quotactl"}
    for operation := range opList {
        opStat := regexCaptureString(operation+": .*", jobBlock)
        if len(opStat) < 1 {
            continue
        }
        opNumbers := regexCaptureStrings("[0-9]*.[0-9]+|[0-9]+", opStat)
        if len(opNumbers) < 1 {
            continue
        }
        result, err := strconv.ParseUint(strings.TrimSpace(opNumbers[0]), 10, 64)
        if err != nil {
            return nil, err
        }
        extraLabelValue = operation
        helpText = jobStatsHelp
        l := lustreStatsMetric{
            title:           promName,
            help:            helpText,
            value:           result,
            extraLabel:      "operation",
            extraLabelValue: extraLabelValue,
        }
        metricList = append(metricList, lustreJobsMetric{jobID, l})     
    }

    return metricList, err
}

With this, we can replace parseJobStatsText with something along the lines of:

func parseJobStatsText(jobStats string, promName string, helpText string) (metricList []lustreJobsMetric, err error) {
    jobs := regexCaptureStrings("(?ms:job_id:.*?(-|\\z))", jobStats)
    if len(jobs) < 1 {
        return nil, nil
    }

    for _, job := range jobs {
        jobID, err := getJobNum(job)
        if err != nil {
            return nil, err
        }
        if helpText == jobStatsHelp {
            jobList, err := getJobStatsOperation(job, jobID)
            if err != nil {
                return nil, err
            }
        } else {
            jobList, err := getJobStatsIO(job, jobID, promName, helpText)  // <- Note, the original getJobStatsByOperation will be renamed to getJobStatsIO
            if err != nil {
                return nil, err
            }
        }
        for _, item := range jobList {
            metricList = append(metricList, item)
        }
    }
    return metricList, nil
}

The same principles would apply for the stats and md_stats files. I think this would simplify some of the logic and improve performance while reducing redundancies.

For performance, we can just parse the file once to get all metrics instead of once per metric. As in, if we have a file with 10 operations in it, with this method, we can just parse it once and move on instead of parsing it 10 times. This probably makes more sense with the metrics being bundled together anyway.

This will also help reduce redundancies by removing all of the help and type declarations for every metric. We won't need to specify every helpText for the operations, especially since we aren't even using that text anymore.

Just a few quick thoughts of mine, let me know if you have any questions or comments on this. Or, if you think we could save something like this for another PR.

joehandzik commented 7 years ago

This is a WIP at the moment, unit tests are going to be broken. I'll let you know when it's ready for re-review.

joehandzik commented 7 years ago

@roclark When you have a chance, take a look and let me know what you think. This is fully updated and representative of what I would like to merge. Feel free to be nitpicky about names and such.

joehandzik commented 7 years ago

@roclark This is done from my perspective. Let me know if you think any of the comments you've left here are dealbreakers.

roclark commented 7 years ago

Looks like we just need to remove the operation label in the stats_integration_test.go file, then everything should work fine. Once done, you good to merge?

joehandzik commented 7 years ago

@roclark Ha, I pushed before lunch assuming I had bigger problems. Looks like it was that simple. Merge if you're ready.