bufanda / zabbix--template-xenorchestra

These are templates for the Zabbix monitoring solution to monitor VM resources in XenOrchestra
17 stars 5 forks source link

Backup discovery #6

Open billcouper81 opened 2 months ago

billcouper81 commented 2 months ago

The version of this template that I downloaded, I had some issues with backup discovery with Zabbix 7.

I don't remember the specifics, but I had to make changes to the javascript for backup stuff. I don't remember changing the discovery rule script, but the 'get job info' script was definitely changed.

I post this here in case it helps in any way. It works for me, but may not work for others. I am on Zabbix 7.0.3 currently.

=====> discovery (untouched?):

try { var params = JSON.parse(value); var req = new HttpRequest(); req.addHeader('Cookie: authenticationToken=' + params.token); var server_url = params.url.endsWith('/') ? params.url.slice(0, -1) : params.url; var response = req.get(encodeURI(server_url + '/rest/v0/backup/jobs/vm')); } catch(error) { Zabbix.log(3, "XenOrchestra API " + server_url + "/rest/v0/backup/jobs/vm Error: " + error); if (!Number.isInteger(error)) { return 520; } } var retVal = [] for (r=0; r < JSON.parse(response).length; r++) { resp = req.get(encodeURI(server_url + JSON.parse(response)[r])); backupInfo = JSON.parse(resp); retVal.push({'url': JSON.parse(response)[r], 'uuid': backupInfo.id, 'name': backupInfo.name}); } return JSON.stringify(retVal);

=====> get job info:

try { var params = JSON.parse(value); var req = new HttpRequest(); req.addHeader('Cookie: authenticationToken=' + params.token); var server_url = params.url.endsWith('/') ? params.url.slice(0, -1) : params.url; var response = req.get(encodeURI(server_url + '/rest/v0/backup/logs')); } catch(error) { Zabbix.log(3, "XenOrchestra API " + server_url + "/rest/v0/backup/logs Error: " + error); if (!Number.isInteger(error)) return 520; } var backupLogUrls = JSON.parse(response); var fullBackupLogs = []; for (var i = backupLogUrls.length-20; i < backupLogUrls.length; i++) { var logUrl = backupLogUrls[i]; var resp = req.get(encodeURI(server_url + logUrl)); var backupLog = JSON.parse(resp); backupLog.url = logUrl; fullBackupLogs.push(backupLog); } for (var i = fullBackupLogs.length - 1; i >= 0; i--) { var backupLog = fullBackupLogs[i]; if (backupLog.jobId == params.uuid && backupLog.status != 'pending') { return JSON.stringify({ "url": backupLog.url, "params": params.uuid, "id": backupLog.id, 'name': backupLog.jobName, 'status': backupLog.status, 'start': backupLog.start, 'end': backupLog.end }); } }

return 404;

bufanda commented 2 months ago

Yeah backup was never fully implemented hence it being disabled per default. 😉

also the way I wanted it to work never was feasible as Zabbix always timedout or limited the memory the JavaScript had to run and therefore killed the script prematurely.

I always meant to remove the backup discovery all together but if you have a better solution maybe make a PR for it?

billcouper81 commented 2 months ago

I have been making and continue to make many adjustments/improvements to the template. For example I need to run passive checks against some of the discovered hosts and VMs, so I extend the discovery to pull IPv4 address of each and create the Agent interface on discovered hosts (vms and servers) automatically (since you cannot manually add interface to discovered host).

Maybe I should just export a copy of my current template and you can look at what I messed with? :)

bufanda commented 2 months ago

I would recommend to use autoregistration and/or discovery of Zabbix for hosts to create hosts that are monitored by an agent. This template was inspired bei the VMWare template and is inteded solely to monitor the VMs and Hypervisors from the orchestration point of view and not the host itself.

If you want to use it for more sure go ahead and add what ever you need, but from my side I would like to keep this desing as is and just have the metrics you get of XenOrchestra as the VMs can be any operating system and even be appliances (pre built VMs) that never come with an agent.

bufanda commented 1 month ago

I've tested you code on my installation regarding backup discovery. I have about 40 VMs with 17 Backup Plans configured and 833 log entries. And I had to set the timeout to 60 seconds so zabbix doesn't abort. In Zabbix 6 I always had issues going to timeouts past 30 seconds but nonetheless in larger installations (and mine isn't large by any means) it's still a hassle to have backup discovery enabled especially with time going on the logs getting more and more.

So not sure if your solution will help to mitigate my inital impressions and opinion about ignoring the backup discovery without implementing something more complex as a plugin on the server side.

Edit: On my installation it takes about 45 seconds average to fetch the data for one job and sometimes it even times out so takes more than 60 seconds.

billcouper81 commented 1 month ago

I am fairly new to Zabbix... how do I debug how long my discovery scripts are running for?

bufanda commented 1 month ago

I am fairly new to Zabbix... how do I debug how long my discovery scripts are running for?

When the discovery ran you can got to data collection->host->items and then click on one of the get job items and there is a Test button at the bottom. Then just run the test. It will actually get values from the host and process them as if it would do when it runs normally. And then just use a stop watch. That's the easiest.

Otherwise you will see messages like

264:20240912:132545.029 item "xenorchestra:xoa.backup.raw[54dff5f8-432d-44ad-a269-6d0248ce39aa]" became not supported: Cannot execute script: Error: cannot get URL: Timeout was reached.
billcouper81 commented 1 month ago

Ahhh, I see! My XO host one of this item for each job "Backup Job Name: Get job info" with key xoa.backup.raw[job-uuid-here]

I can click test on the item, and click the "Get value and test" button. Timing it with a stopwatch, it takes between 10 and 12 seconds. The result is the expected json string that contains info about the job like status/start/end etc.

Our environments are obviously different and I'm not sure what is impacting amount of time needed for this task. I think with the discovery code I am running, the more jobs you have the longer it will take, but the number of job logs has no impact. Maybe.

edit: I just noticed something in my 'get job info' script that I had forgot. My script is only processing the most recent 20 backup job logs, since I have less than 20 jobs it seemed more than enough. This would cut the script time dramatically vs the template default script... this change is based on assumptions about my exact setup, where every backup job runs once every day... I will see if I can come up with anything better and more robust that works for more configs, but the api is not good as you no doubt aware :)

billcouper81 commented 1 month ago

I was playing with the Get Job Info script today. Try this version? It only checks as many logs as it needs before it finds the matching uuid. With my setup of every job running once every day, this is faster (now 3-11 seconds for any job).

EDIT: Updated script with log filtering. Even better and should be compatible with any job scheduling? Hmmm not sure what would happen if any jobs are disabled... problem for future us.

"Get Job Info" script:

try {
    var params = JSON.parse(value);
    var req = new HttpRequest();
    req.addHeader('Cookie: authenticationToken=' + params.token);
    var server_url = params.url.endsWith('/') ? params.url.slice(0, -1) : params.url;
    var response = req.get(encodeURI(server_url + '/rest/v0/backup/logs?filter=jobId:' + params.uuid));
} catch(error) {
    Zabbix.log(3, "XenOrchestra API " + server_url + "/rest/v0/backup/logs Error: " + error);
    if (!Number.isInteger(error))
        return 520;
}
var backupLogUrls = JSON.parse(response);
for (var i = backupLogUrls.length -1; i >= 0; i--) {
    var logUrl = backupLogUrls[i];
    var resp = req.get(encodeURI(server_url + logUrl));
    var backupLog = JSON.parse(resp);
    backupLog.url = logUrl;
    if (backupLog.jobId == params.uuid) {
        return JSON.stringify({
            "url": backupLog.url,
            "params": params.uuid,
            "id": backupLog.id,
            'name': backupLog.jobName,
            'status': backupLog.status,
            'start': backupLog.start,
            'end': backupLog.end
        });
    }
}
return 404; 
billcouper81 commented 1 month ago

I revised the "Get Job Info" script. Removed the for loop and added additional error checking.

try {
    var params = JSON.parse(value);
    var req = new HttpRequest();
    req.addHeader("Cookie: authenticationToken=" + params.token);
    var server_url = params.url.endsWith("/") ? params.url.slice(0, -1) : params.url;
    var response = req.get(encodeURI(server_url + "/rest/v0/backup/logs?filter=jobId:" + params.uuid));
    if (!response) {
        Zabbix.log(3, "XenOrchestra API Error: Empty response for URL " + server_url + "/rest/v0/backup/logs?filter=jobId:" + params.uuid);
        return 501;
    }
    var backupLogUrls = JSON.parse(response);
    if (backupLogUrls.length === 0) {
        Zabbix.log(3, "XenOrchestra API Error: No backup logs found for job Id " + params.uuid);
        return 502;
    }
} catch (error) {
    Zabbix.log(3, "XenOrchestra API Error: " + error);
    return 503;
}

try {
    var logUrl = backupLogUrls[backupLogUrls.length - 1];
    var resp = req.get(encodeURI(server_url + logUrl));
    if (!resp) {
        Zabbix.log(3, "XenOrchestra API Error: Empty response for log URL " + logUrl);
        return 504;
    }
    var backupLog = JSON.parse(resp);
    backupLog.url = logUrl;
    if (backupLog.jobId === params.uuid) {
        return JSON.stringify({
            "url": backupLog.url,
            "params": params.uuid,
            "id": backupLog.id,
            "name": backupLog.jobName,
            "status": backupLog.status,
            "start": backupLog.start,
            "end": backupLog.end
        });
    }
    return 505;
} catch (error) {
    Zabbix.log(3, "XenOrchestra API Error: " + error);
    return 506;
}
bufanda commented 1 month ago

Sorry for the delayed response. Will check your latest implementation and comeback to you.

bufanda commented 1 month ago

Tested it now for a while and it looks good to me. If you have any recommendations for Items the Discovery should create I'll start to add them to my template. Saw your Dashbaord in #5 If you like to share it we could add it to the template too.