bluecmd / spectrum_virtualize_exporter

Prometheus exporter for IBM Spectrum Virtualize compatible SANs
8 stars 8 forks source link

Add support for monitoring IBM SVC (Spectrum virtual controller) #3

Open matejzero opened 3 years ago

matejzero commented 3 years ago

Hey,

we have IBM SVC infront of V5000 and FlashSystems. I check your exporter and it works on V5000, but not on SVC itself, since some commands are different on SVCs.

For example, lsnodecanisterstats does not exists, but you can use lssystemstats for the same output for cluster-wide metrics or lsnodestats for per-node output. lssystemstats doesn't have node_id and node_name rows, but lsnodestats does.

Example:

>lssystemstats
stat_name          stat_current stat_peak stat_peak_time
compression_cpu_pc 9            10        210303073135
cpu_pc             8            10        210303073300
fc_mb              965          2154      210303073120
fc_io              207019       279810    210303073005
sas_mb             0            0         210303073351
sas_io             0            0         210303073351
iscsi_mb           0            0         210303073351
iscsi_io           0            0         210303073351
write_cache_pc     34           36        210303073315
total_cache_pc     79           80        210303073330

>lsnodestats
node_id node_name stat_name          stat_current stat_peak stat_peak_time
1       SVCLOC1   compression_cpu_pc 10           10        210303073812
1       SVCLOC1   cpu_pc             11           13        210303073622
1       SVCLOC1   fc_mb              398          677       210303073622
1       SVCLOC1   fc_io              62987        79065     210303073622
1       SVCLOC1   sas_mb             0            0         210303073812
1       SVCLOC1   sas_io             0            0         210303073812
1       SVCLOC1   iscsi_mb           0            0         210303073812

No change needed:

A few commands don't exists / make sense on SVC:

Output of lssystem can be used to determine the product type. There is a product_name field:

lssystem | grep product_name
product_name IBM SAN Volume Controller
product_name IBM Storwize V5000

There are some other metrics one could monitor on SVC storage, such as lsquorum for alerting when quorum is down as quorum is storage and network based.

>lsquorum
quorum_index status id name              controller_id controller_name active object_type override site_id site_name
0            online 0  FS900_VDISK_00 0             FS900        no     mdisk       no       1       LOC1
1            online 53 mdisk21           3             IBM5030-1     no     mdisk       no       2       LOC2
3            online                                                    yes    device      no               host.example.org/192.168.4.4

Not sure how useful it is, but an info metrics could be generated from lsnode output with labels such as

Is this something you would be interesting to implement? Just asking, in case you don't want / have time, I will write my own exporter in Python (I don't know Go, so can't extend your). I'm more than happy to test and provide example REST outputs if needed.

bluecmd commented 3 years ago

Hello, we should definitely add this!

Could you tell me if you want to monitor only the SVC - i.e. that it proxies all the metrics for the V5000, or it would be in addition?

Do you know if this controller thing exists as a virtual machine? If I could find a trial of it I could try to run it towards my V7000 but otherwise we can probably work with you providing the .jsonnet dumps and reviewing my suggested code.

matejzero commented 3 years ago

I think it should only be an addition as one can attach any block storage to SVC and use it as storage source.

I don't think the SVC is available as a VM, it only comes as an appliance.

V5000/V7000, SVC and Flashsystem provide similar commands with almost identical output. As I wrote in the top ticket, for starters, you could only check what system we are pooling and based on that, use specific command or, in case of SVC, not collect lsenclosurestats, lsdrive or lsenclosurepsu metrics as they don't generate any output on SVC

I'm a bit disapointed that IBM doesn't expose more detailed metrics over API. There is an option to enable collection of statistics (startstat), but the XML is only available over SCP. There are detailed statistics for mdisk, vdisk, node, cache, cluster, FC, iSCSI,... But we have to work with what we have:)

I'll try to generate some JSON tomorrow if you need it. And that for willing to work on this:)

olemyk commented 2 years ago

Hi @bluecmd

Trying to see if get this Exporter to work against. SVC - (8.5 code)

And it from what is can see the, the lsnodecanisterstats have now "0.x" in the field. so im getting this error message:

2022/09/21 19:34:32 Loaded 2 API credentials 2022/09/21 19:34:32 Spectrum Virtualize exporter running, listening on ":9747" 2022/09/21 19:34:35 Error: json: cannot unmarshal number 0.150 into Go struct field nodeStat.stat_current of type int 2022/09/21 19:34:35 Probe of "https://10.33.7.56:7443" failed, took 0.293 seconds

I'm not that familiar with GO, but from what i understand it's the StatCurrent with int that's the issue her? any help is appreciated

    type nodeStat struct {
        NodeID      string `json:"node_id"`
        StatName    string `json:"stat_name"`
        StatCurrent int    `json:"stat_current,string"`
    }
    var st []nodeStat

    if err := c.Get("rest/v1/lsnodecanisterstats", "", &st); err != nil {
        log.Printf("Error: %v", err)
        return false
    }

    for _, s := range st {
        if s.StatName == "compression_cpu_pc" {
            mCmpCPU.WithLabelValues(s.NodeID).Set(float64(s.StatCurrent) / 100.0)
        } else if s.StatName == "cpu_pc" {
            mSysCPU.WithLabelValues(s.NodeID).Set(float64(s.StatCurrent) / 100.0)

2022/09/21 19:34:32 Loaded 2 API credentials 2022/09/21 19:34:32 Spectrum Virtualize exporter running, listening on ":9747" 2022/09/21 19:34:35 Error: json: cannot unmarshal number 0.150 into Go struct field nodeStat.stat_current of type int 2022/09/21 19:34:35 Probe of "https://10.33.7.56:7443" failed, took 0.293 seconds

   {
        "node_id": "1",
        "node_name": "SVC02_N1_75ARCC0",
        "stat_name": "vdisk_w_io",
        "stat_current": "332",
        "stat_peak": "567",
        "stat_peak_time": "220921152635"
    },
    {
        "node_id": "1",
        "node_name": "SVC02_N1_75ARCC0",
        "stat_name": "vdisk_w_ms",
        "stat_current": "0.193",
        "stat_peak": "0.816",
        "stat_peak_time": "220921152715"
    },