Open billcouper81 opened 2 months ago
Yeah backup was never fully implemented hence it being disabled per default. 😉
also the way I wanted it to work never was feasible as Zabbix always timedout or limited the memory the JavaScript had to run and therefore killed the script prematurely.
I always meant to remove the backup discovery all together but if you have a better solution maybe make a PR for it?
I have been making and continue to make many adjustments/improvements to the template. For example I need to run passive checks against some of the discovered hosts and VMs, so I extend the discovery to pull IPv4 address of each and create the Agent interface on discovered hosts (vms and servers) automatically (since you cannot manually add interface to discovered host).
Maybe I should just export a copy of my current template and you can look at what I messed with? :)
I would recommend to use autoregistration and/or discovery of Zabbix for hosts to create hosts that are monitored by an agent. This template was inspired bei the VMWare template and is inteded solely to monitor the VMs and Hypervisors from the orchestration point of view and not the host itself.
If you want to use it for more sure go ahead and add what ever you need, but from my side I would like to keep this desing as is and just have the metrics you get of XenOrchestra as the VMs can be any operating system and even be appliances (pre built VMs) that never come with an agent.
I've tested you code on my installation regarding backup discovery. I have about 40 VMs with 17 Backup Plans configured and 833 log entries. And I had to set the timeout to 60 seconds so zabbix doesn't abort. In Zabbix 6 I always had issues going to timeouts past 30 seconds but nonetheless in larger installations (and mine isn't large by any means) it's still a hassle to have backup discovery enabled especially with time going on the logs getting more and more.
So not sure if your solution will help to mitigate my inital impressions and opinion about ignoring the backup discovery without implementing something more complex as a plugin on the server side.
Edit: On my installation it takes about 45 seconds average to fetch the data for one job and sometimes it even times out so takes more than 60 seconds.
I am fairly new to Zabbix... how do I debug how long my discovery scripts are running for?
I am fairly new to Zabbix... how do I debug how long my discovery scripts are running for?
When the discovery ran you can got to data collection->host->items
and then click on one of the get job items and there is a Test button at the bottom. Then just run the test. It will actually get values from the host and process them as if it would do when it runs normally. And then just use a stop watch. That's the easiest.
Otherwise you will see messages like
264:20240912:132545.029 item "xenorchestra:xoa.backup.raw[54dff5f8-432d-44ad-a269-6d0248ce39aa]" became not supported: Cannot execute script: Error: cannot get URL: Timeout was reached.
Ahhh, I see! My XO host one of this item for each job "Backup Job Name: Get job info" with key xoa.backup.raw[job-uuid-here]
I can click test on the item, and click the "Get value and test" button. Timing it with a stopwatch, it takes between 10 and 12 seconds. The result is the expected json string that contains info about the job like status/start/end etc.
Our environments are obviously different and I'm not sure what is impacting amount of time needed for this task. I think with the discovery code I am running, the more jobs you have the longer it will take, but the number of job logs has no impact. Maybe.
edit: I just noticed something in my 'get job info' script that I had forgot. My script is only processing the most recent 20 backup job logs, since I have less than 20 jobs it seemed more than enough. This would cut the script time dramatically vs the template default script... this change is based on assumptions about my exact setup, where every backup job runs once every day... I will see if I can come up with anything better and more robust that works for more configs, but the api is not good as you no doubt aware :)
I was playing with the Get Job Info script today. Try this version? It only checks as many logs as it needs before it finds the matching uuid. With my setup of every job running once every day, this is faster (now 3-11 seconds for any job).
EDIT: Updated script with log filtering. Even better and should be compatible with any job scheduling? Hmmm not sure what would happen if any jobs are disabled... problem for future us.
"Get Job Info" script:
try {
var params = JSON.parse(value);
var req = new HttpRequest();
req.addHeader('Cookie: authenticationToken=' + params.token);
var server_url = params.url.endsWith('/') ? params.url.slice(0, -1) : params.url;
var response = req.get(encodeURI(server_url + '/rest/v0/backup/logs?filter=jobId:' + params.uuid));
} catch(error) {
Zabbix.log(3, "XenOrchestra API " + server_url + "/rest/v0/backup/logs Error: " + error);
if (!Number.isInteger(error))
return 520;
}
var backupLogUrls = JSON.parse(response);
for (var i = backupLogUrls.length -1; i >= 0; i--) {
var logUrl = backupLogUrls[i];
var resp = req.get(encodeURI(server_url + logUrl));
var backupLog = JSON.parse(resp);
backupLog.url = logUrl;
if (backupLog.jobId == params.uuid) {
return JSON.stringify({
"url": backupLog.url,
"params": params.uuid,
"id": backupLog.id,
'name': backupLog.jobName,
'status': backupLog.status,
'start': backupLog.start,
'end': backupLog.end
});
}
}
return 404;
I revised the "Get Job Info" script. Removed the for loop and added additional error checking.
try {
var params = JSON.parse(value);
var req = new HttpRequest();
req.addHeader("Cookie: authenticationToken=" + params.token);
var server_url = params.url.endsWith("/") ? params.url.slice(0, -1) : params.url;
var response = req.get(encodeURI(server_url + "/rest/v0/backup/logs?filter=jobId:" + params.uuid));
if (!response) {
Zabbix.log(3, "XenOrchestra API Error: Empty response for URL " + server_url + "/rest/v0/backup/logs?filter=jobId:" + params.uuid);
return 501;
}
var backupLogUrls = JSON.parse(response);
if (backupLogUrls.length === 0) {
Zabbix.log(3, "XenOrchestra API Error: No backup logs found for job Id " + params.uuid);
return 502;
}
} catch (error) {
Zabbix.log(3, "XenOrchestra API Error: " + error);
return 503;
}
try {
var logUrl = backupLogUrls[backupLogUrls.length - 1];
var resp = req.get(encodeURI(server_url + logUrl));
if (!resp) {
Zabbix.log(3, "XenOrchestra API Error: Empty response for log URL " + logUrl);
return 504;
}
var backupLog = JSON.parse(resp);
backupLog.url = logUrl;
if (backupLog.jobId === params.uuid) {
return JSON.stringify({
"url": backupLog.url,
"params": params.uuid,
"id": backupLog.id,
"name": backupLog.jobName,
"status": backupLog.status,
"start": backupLog.start,
"end": backupLog.end
});
}
return 505;
} catch (error) {
Zabbix.log(3, "XenOrchestra API Error: " + error);
return 506;
}
Sorry for the delayed response. Will check your latest implementation and comeback to you.
Tested it now for a while and it looks good to me. If you have any recommendations for Items the Discovery should create I'll start to add them to my template. Saw your Dashbaord in #5 If you like to share it we could add it to the template too.
The version of this template that I downloaded, I had some issues with backup discovery with Zabbix 7.
I don't remember the specifics, but I had to make changes to the javascript for backup stuff. I don't remember changing the discovery rule script, but the 'get job info' script was definitely changed.
I post this here in case it helps in any way. It works for me, but may not work for others. I am on Zabbix 7.0.3 currently.
=====> discovery (untouched?):
try {
var params = JSON.parse(value);
var req = new HttpRequest();
req.addHeader('Cookie: authenticationToken=' + params.token);
var server_url = params.url.endsWith('/') ? params.url.slice(0, -1) : params.url;
var response = req.get(encodeURI(server_url + '/rest/v0/backup/jobs/vm'));
} catch(error) {
Zabbix.log(3, "XenOrchestra API " + server_url + "/rest/v0/backup/jobs/vm Error: " + error);
if (!Number.isInteger(error)) {
return 520;
}
}
var retVal = []
for (r=0; r < JSON.parse(response).length; r++) {
resp = req.get(encodeURI(server_url + JSON.parse(response)[r]));
backupInfo = JSON.parse(resp);
retVal.push({'url': JSON.parse(response)[r], 'uuid': backupInfo.id, 'name': backupInfo.name});
}
return JSON.stringify(retVal);
=====> get job info:
try {
var params = JSON.parse(value);
var req = new HttpRequest();
req.addHeader('Cookie: authenticationToken=' + params.token);
var server_url = params.url.endsWith('/') ? params.url.slice(0, -1) : params.url;
var response = req.get(encodeURI(server_url + '/rest/v0/backup/logs'));
} catch(error) {
Zabbix.log(3, "XenOrchestra API " + server_url + "/rest/v0/backup/logs Error: " + error);
if (!Number.isInteger(error))
return 520;
}
var backupLogUrls = JSON.parse(response);
var fullBackupLogs = [];
for (var i = backupLogUrls.length-20; i < backupLogUrls.length; i++) {
var logUrl = backupLogUrls[i];
var resp = req.get(encodeURI(server_url + logUrl));
var backupLog = JSON.parse(resp);
backupLog.url = logUrl;
fullBackupLogs.push(backupLog);
}
for (var i = fullBackupLogs.length - 1; i >= 0; i--) {
var backupLog = fullBackupLogs[i];
if (backupLog.jobId == params.uuid && backupLog.status != 'pending') {
return JSON.stringify({
"url": backupLog.url,
"params": params.uuid,
"id": backupLog.id,
'name': backupLog.jobName,
'status': backupLog.status,
'start': backupLog.start,
'end': backupLog.end
});
}
}
return 404;