Closed TheNetworkGuy closed 4 days ago
I am one that would appreciate this. There are several commonalities for fields that could be used with VMs and with devices. I'm currently in the process of planning out the mapping and in our world I could use:
tenant, site_group, site, dev_role for both devices and VMs
Edit: would it be possible to use a custom field(s) to fill in the gaps if other fields wanted to be used?
@TheNetworkGuy What is your thought about enabling VM syncing?
Given the limited supported fields on both, for an initial implementation if VMs would be used with the script, the user must use fields that are supported in both models.
A future enhancement request could be added to allow for separate mapping of devices and VMs.
@Kage1 I did make a foundation for further work on this feature request with the latest update. The script uses modules instead of a single file which makes work for VM syncing way easier to initially implement.
VM syncing will be a beta feature for sure for quite some time.
Work has started for this request. First stop is the hostgroup generation logic which needs to be completely rewritten in order to support 2 different types of data structures and differentiate between VM's and devices.
+1 for this feature please.
+1 This would be a great enhancement
+1 This feature would definitely be appreciated! Looking forward to seeing how this enhancement progresses.
I added the ability to sync with VM's here: https://github.com/TheNetworkGuy/netbox-zabbix-sync/pull/76
I'm sorry if the code is a bit spaghetti, but it was done quick. If I have time (and I remember) I will refactor :))
An update from the void: i haven't lost track of this feature request. Personally things have been buzy but i'm very happy that you all are showing me that this feature is definitely priority nr.1.
The reason as to why i haven't merged feature request #76 is the same as the author described above, i don't think the code has all of the features and customizability that the project deserves. Furthermore i also started on several pieces of the code earlier this year and would like to follow it up since it does provide a solid base to work further on.
I'll have more free time next week so i'll probably post some updates or even a branch for you all to test with! :)
Oh yeah I agree my pr: https://github.com/TheNetworkGuy/netbox-zabbix-sync/pull/76 is lacking and I don't think it should be merged now, but it's a quick workaround for anyone who needs it like I do/did.
I'm done with seperating the hostgroup logic from the device class. This was crucial since hostgroup generation needs to be a module for devices and VM's. I've also implemented a system where VM's and Physical devices are sharing a "default" list of hostgroup attributes. For instance, VM's have clusters which devices do not have. But they share things such as sites and (nested) site groups.
Next step is buiding the basic VM sync capabilities.
I've done some initial work today and i think that the current setup works pretty well with VM syncing. I'm looking for some volunteers to test the code. You can help me out by switching to the virtual_machines branch and pulling the last version of that code. I am Looking forward to your responses, bugs and feature requests.
I switched my compose file to the virtual_machines branch but when trying to do a pull I'm getting "Error manifest unknown"
Here's the image path: ghcr.io/thenetworkguy/netbox-zabbix-sync:virtual_machines
Hey @Kage1, that is correct there is currently no docker image available for the branch code :) Unfortunately you'll have to either build the container yourself or run the code locally before its pushed to the main branch.
I'll look into the options of building one for now but my priority was mainly the actual code instead of deployment option :)
@TheNetworkGuy I expanded my knowledge today... Git cloned the vm branch and build my own container from the Dockerfile.
Updated the config.py for the VM Sync and hostgroup format. Here's the relevant config. I have verified that the tenant/site_group/site/role all exist for VMs so it should be ok.
sync_vms = True
vm_hostgroup_format = "tenant/site_group/site/role"
Here's the output when I'm running the script. I'm not sure why it's bombing on "location" when the format is not calling for it.
Attaching to netbox-zabbix-sync
netbox-zabbix-sync | Traceback (most recent call last):
netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 243, in <module>
netbox-zabbix-sync | main(args)
netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 138, in main
netbox-zabbix-sync | vm.set_hostgroup(vm_hostgroup_format,netbox_site_groups,netbox_regions)
netbox-zabbix-sync | File "/opt/netbox-zabbix/modules/virtualMachine.py", line 22, in set_hostgroup
netbox-zabbix-sync | hg = Hostgroup("vm", self.nb, self.nb_api_version,
netbox-zabbix-sync | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
netbox-zabbix-sync | File "/opt/netbox-zabbix/modules/hostgroups.py", line 20, in __init__
netbox-zabbix-sync | self._set_format_options()
netbox-zabbix-sync | File "/opt/netbox-zabbix/modules/hostgroups.py", line 51, in _set_format_options
netbox-zabbix-sync | if self.nb.location:
netbox-zabbix-sync | ^^^^^^^^^^^^^^^^
netbox-zabbix-sync | File "/usr/local/lib/python3.12/site-packages/pynetbox/core/response.py", line 308, in __getattr__
netbox-zabbix-sync | raise AttributeError('object has no attribute "{}"'.format(k))
netbox-zabbix-sync | AttributeError: object has no attribute "location"
netbox-zabbix-sync exited with code 1
I see, the location property is a physical object related to devices only. A VM can be assigned to a site or cluster but not a location. A location is also child of a parent site object. I made the mistake of assuming that a site is also part of a location. Its not, a region is. I've modified the code so that formatting is working properly.
I've done some initial work today and i think that the current setup works pretty well with VM syncing. I'm looking for some volunteers to test the code. You can help me out by switching to the virtual_machines branch and pulling the last version of that code. I am Looking forward to your responses, bugs and feature requests.
Hmm I tested it the new branch and it looks like it can't find my netbox config context. I can see that device.set_template()
uses the templates_config_context, templates_config_context_overrule
while the new vm.get_templates_context()
does not maybe that's it?. I am still trying to investigate/fix.
I've done some initial work today and i think that the current setup works pretty well with VM syncing. I'm looking for some volunteers to test the code. You can help me out by switching to the virtual_machines branch and pulling the last version of that code. I am Looking forward to your responses, bugs and feature requests.
Hmm I tested it the new branch and it looks like it can't find my netbox config context. I can see that
device.set_template()
uses thetemplates_config_context, templates_config_context_overrule
while the newvm.get_templates_context()
does not maybe that's it?. I am still trying to investigate/fix.
Did you pull the latest branch? I've made some bug fixes in the past 2 hours. It is correct that the function for VM's is different. The set_vm_template() function is called to skip custom field lookups in the device_type model (since it does not exist for VM's.) The set_template() function has been unaffected, just like the get_templates_context() function which is identical for both VM's and devices.
FYI i've used the following context on one of my test VM objects:
{
"zabbix": {
"templates": [
"ICMP Ping"
]
}
}
I just pulled the latest a few mins ago, built the new docker and got this:
I checked and tenant does exist in the VM model.
Container netbox-zabbix-sync Created
Attaching to netbox-zabbix-sync
netbox-zabbix-sync | Traceback (most recent call last):
netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 245, in <module>
netbox-zabbix-sync | main(args)
netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 138, in main
netbox-zabbix-sync | vm.set_hostgroup(vm_hostgroup_format,netbox_site_groups,netbox_regions)
netbox-zabbix-sync | File "/opt/netbox-zabbix/modules/virtual_machine.py", line 30, in set_hostgroup
netbox-zabbix-sync | hg = Hostgroup("vm", self.nb, self.nb_api_version)
netbox-zabbix-sync | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
netbox-zabbix-sync | File "/opt/netbox-zabbix/modules/hostgroups.py", line 17, in __init__
netbox-zabbix-sync | self._set_format_options()
netbox-zabbix-sync | File "/opt/netbox-zabbix/modules/hostgroups.py", line 53, in _set_format_options
netbox-zabbix-sync | format_options["tenant_group"] = str(self.tenant.group) if self.nb.tenant else None
netbox-zabbix-sync | ^^^^^^^^^^^
netbox-zabbix-sync | AttributeError: 'Hostgroup' object has no attribute 'tenant'
netbox-zabbix-sync exited with code 1
@Kage1 Fixed the bug and pushing code to the repo right now. Thank you for helping out and your patience thus far.
I've got another one, this VM doesn't have a config context nor does it have a Zabbix template assigned to it. So there shouldn't be an attempt to sync it.
As a side note, I'm using tags to generate the context for the device vs assigning a template directly to the device/VM.
@Kage1 Fixed the bug and pushing code to the repo right now. Thank you for helping out and your patience thus far.
Your welcome, this is a bit self serving since this update will reduce our on-boarding of our new Zabbix install considerably.
Attaching to netbox-zabbix-sync
netbox-zabbix-sync | 2024-10-30 12:47:14,432 - Netbox-Zabbix-sync - WARNING - Key 'zabbix' not found in config context for template host VM01
netbox-zabbix-sync | 2024-10-30 12:47:14,433 - Netbox-Zabbix-sync - DEBUG - Host VM01: Starting inventory mapper
netbox-zabbix-sync | Traceback (most recent call last):
netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 245, in <module>
netbox-zabbix-sync | main(args)
netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 140, in main
netbox-zabbix-sync | vm.set_inventory(nb_vm)
netbox-zabbix-sync | File "/opt/netbox-zabbix/modules/device.py", line 183, in set_inventory
netbox-zabbix-sync | value = value[item] if value else None
netbox-zabbix-sync | ~~~~~^^^^^^
netbox-zabbix-sync | File "/usr/local/lib/python3.12/site-packages/pynetbox/core/response.py", line 323, in __getitem__
netbox-zabbix-sync | return dict(self)[k]
netbox-zabbix-sync | ~~~~~~~~~~^^^
netbox-zabbix-sync | KeyError: 'asset_tag'
netbox-zabbix-sync exited with code 1
@Kage1 Thanks for your findings.
This shows that unit tests become almost required since there are so many different configurations. I didn't use inventory mapping so i never tested it on my dev system.
I've pushed some new code with:
Quite a big change this is. Not only a completely new way of syncing with new modules etc. but also a completely new hostgroup generation system in place. And both have their bugs :P
Getting closer, the script isn't bombing out now, but I'm wondering why it's touching VMs that have no config context or template to sync.
Attaching to netbox-zabbix-sync
netbox-zabbix-sync | 2024-10-31 07:23:37,172 - Netbox-Zabbix-sync - DEBUG - Host VM01: started operations on VM.
netbox-zabbix-sync | 2024-10-31 07:23:37,631 - Netbox-Zabbix-sync - WARNING - Host VM01: Key 'zabbix' not found in config context for template
netbox-zabbix-sync | 2024-10-31 07:23:37,633 - Netbox-Zabbix-sync - DEBUG - Host VM02: started operations on VM.
netbox-zabbix-sync | 2024-10-31 07:23:38,027 - Netbox-Zabbix-sync - WARNING - Host VM02: Key 'zabbix' not found in config context for template
netbox-zabbix-sync | 2024-10-31 07:23:38,028 - Netbox-Zabbix-sync - DEBUG - Host VM03: started operations on VM.
netbox-zabbix-sync | 2024-10-31 07:23:38,425 - Netbox-Zabbix-sync - WARNING - Host VM03: Key 'zabbix' not found in config context for template
Here's the sync operation with the context applied to VM01.
Config Context
{
"zabbix": {
"interface_port": 10500,
"interface_type": 1,
"proxy": "lab01",
"proxy_group": "Lab",
"templates": [
"Lab ICMP Ping"
]
}
}
The host was created in Zabbix and the HostID was written back into Netbox. Looking good.
Question: Why are there three loops to create the group structure in Zabbix?
Attaching to netbox-zabbix-sync
netbox-zabbix-sync | 2024-10-31 07:42:31,946 - Netbox-Zabbix-sync - DEBUG - Host VM01: started operations on VM.
netbox-zabbix-sync | 2024-10-31 07:42:33,030 - Netbox-Zabbix-sync - INFO - Hostgroup 'MyTenant/MySiteGroup/MySite/MyRole': created in Zabbix.
netbox-zabbix-sync | 2024-10-31 07:42:33,151 - Netbox-Zabbix-sync - INFO - Hostgroup 'MyTenant/MySiteGroup/MySite': created in Zabbix.
netbox-zabbix-sync | 2024-10-31 07:42:33,277 - Netbox-Zabbix-sync - INFO - Hostgroup 'MyTenant/MySiteGroup/': created in Zabbix.
netbox-zabbix-sync | 2024-10-31 07:42:33,367 - Netbox-Zabbix-sync - DEBUG - Host VM01: matched group MyTenant/MySiteGroup/MySite/MyRole
netbox-zabbix-sync | 2024-10-31 07:42:33,367 - Netbox-Zabbix-sync - DEBUG - Host VM01: found template Lab ICMP Ping
netbox-zabbix-sync | 2024-10-31 07:42:33,368 - Netbox-Zabbix-sync - DEBUG - Host VM01: using proxy_group Lab
netbox-zabbix-sync | 2024-10-31 07:42:34,325 - Netbox-Zabbix-sync - INFO - Host VM01: Created host in Zabbix.
To answer your questions:
Question: Why are there three loops to create the group structure in Zabbix? Because the group structure is nested the script will create each parent group for a host. In your example group "MyTenant" already existed. But all of the other groups did not. This nesting has been introduced a couple of commits earlier in the main branch and makes sure that setting permissions on the main parent ("MyTenant") results in users accessing all of the child groups. Also handy for filtering etc.
I'm wondering why it's touching VMs that have no config context or template to sync. This is actually intended behaviour. Processing of the host in this context means gathering initial data to prepare for synchronisation. During this data gathering the script finds out that there is no valid template for this host. And this continues to the next host. As shown in the logs :) Would you like this to be a bit better defined with logging? Of course documentation needs to be made for this new process so i can imagine that things are not 100% clear.
The host was created in Zabbix and the HostID was written back into Netbox. Looking good. Glad to hear! I think its the first VM sync on your end that actually works! :)
{
"zabbix": {
"interface_port": 10500,
"interface_type": 1,
"proxy": "lab01",
"proxy_group": "Lab",
"templates": [
"Lab ICMP Ping"
]
}
}
I'm not sure if "interface_port" and "interface_type" are required since the default for VM's is already those parameters :) But specifying them does not hurt anyone. And i'm wondering if you need the "proxy" parameter since you also have the "proxy_group" configured?
But other than that i'm very happy that your VM got synced!
To answer your questions:
Question: Why are there three loops to create the group structure in Zabbix? Because the group structure is nested the script will create each parent group for a host. In your example group "MyTenant" already existed. But all of the other groups did not. This nesting has been introduced a couple of commits earlier in the main branch and makes sure that setting permissions on the main parent ("MyTenant") results in users accessing all of the child groups. Also handy for filtering etc.
Cool that makes sense, gives me a bit more understanding when I see items in the log.
I'm wondering why it's touching VMs that have no config context or template to sync. This is actually intended behaviour. Processing of the host in this context means gathering initial data to prepare for synchronisation. During this data gathering the script finds out that there is no valid template for this host. And this continues to the next host. As shown in the logs :) Would you like this to be a bit better defined with logging? Of course documentation needs to be made for this new process so i can imagine that things are not 100% clear.
Yes, I thinking more of a logging thing. It takes time to write stuff to the screen vs the script just doing it's thing and letting the user know and/or logging when it actually did something, e.g. creating/updating/deleting a new host etc. A full sync took about 2 mins for devices and VMs. I'd say if each VM output wasn't being written the sync would run dramatically faster, prob limited to the API access on the Netbox VM.
Our prod environment has over 200 VM's and even more devices.
How is the device sync running so fast? On the surface it doesn't appear that it's parsing through all of the devices, but I guess it must be to see if it needs to be synced or not.
** Glad to hear! I think its the first VM sync on your end that actually works! :)
Yup :) Your doing a fantastic job on this project.
I'm not sure if "interface_port" and "interface_type" are required since the default for VM's is already those parameters :) But specifying them does not hurt anyone. And i'm wondering if you need the "proxy" parameter since you also have the "proxy_group" configured?
I personally like to see the port and type since it's a good visual for me to know which contexts have rendered for that device/vm.
Good point about the proxy vs group. We plan on having multiple proxy's and groups so I'll most likely just drop the direct proxy reference.
I was playing with the format and found one more possible bug, it looks like the location for devices is broken. I tried dev_location as noted in the current docs and location, both failed.
hostgroup_format = "tenant/site_group/site/role/location"
Attaching to netbox-zabbix-sync
netbox-zabbix-sync | 2024-10-31 13:40:26,656 - Netbox-Zabbix-sync - ERROR - Hostgroup item location is not valid. Make sure you use valid items and seperate them with '/'.
netbox-zabbix-sync | Traceback (most recent call last):
netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 260, in <module>
netbox-zabbix-sync | main(args)
netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 93, in main
netbox-zabbix-sync | raise HostgroupError(e)
netbox-zabbix-sync | modules.exceptions.HostgroupError: Hostgroup item location is not valid. Make sure you use valid items and seperate them with '/'.
netbox-zabbix-sync exited with code 1
I was playing with the format and found one more possible bug, it looks like the location for devices is broken. I tried dev_location as noted in the current docs and location, both failed.
hostgroup_format = "tenant/site_group/site/role/location"
Attaching to netbox-zabbix-sync netbox-zabbix-sync | 2024-10-31 13:40:26,656 - Netbox-Zabbix-sync - ERROR - Hostgroup item location is not valid. Make sure you use valid items and seperate them with '/'. netbox-zabbix-sync | Traceback (most recent call last): netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 260, in <module> netbox-zabbix-sync | main(args) netbox-zabbix-sync | File "/opt/netbox-zabbix/netbox_zabbix_sync.py", line 93, in main netbox-zabbix-sync | raise HostgroupError(e) netbox-zabbix-sync | modules.exceptions.HostgroupError: Hostgroup item location is not valid. Make sure you use valid items and seperate them with '/'. netbox-zabbix-sync exited with code 1
Good find! Fixed!
How is the device sync running so fast? On the surface it doesn't appear that it's parsing through all of the devices, but I guess it > must be to see if it needs to be synced or not.
Could it be that your devices are already synced and your VMs are not? Then it could make sense, for each new VM the Zabbix API needs to return a valid response to the create() function. To speed operations up, the script pulls all of the hosts and devices beforehand of Netbox to eliminate the amount of API calls. This is the main factor of slowdown.
Although disabeling certain features such as hostgroup nesting can enable better performance (less API calls to external systems.)
In our lab Netbox only has 12 devices setup for sync out of about 200. A device only sync is approx 15 seconds.
A VM sync with approx 10 VMs out of 100 takes over a min and a half.
I timed both the device and VM sync times.
I'll see if i do some performance testing on my end :) Input from other users would help out a ton as well.
On my end, the virtual machine sync takes 4 seconds for 33 VMs. I think your sync is taking some time because it waits for Zabbix to add all the zabbix template items in the zabbix virtual machine. In my case, the only lines that increase time are "found template" and "started operations on VM". Here is a part of the sync ouput I get for one of my VMs : 2024-11-13 12:52:52,670 - Netbox-Zabbix-sync - DEBUG - Host TEST: started operations on VM. 2024-11-13 12:52:52,767 - Netbox-Zabbix-sync - DEBUG - Host TEST: matched group TEST/TEST/TEST 2024-11-13 12:52:52,767 - Netbox-Zabbix-sync - DEBUG - Host TEST: found template Windows Server by Zabbix agent 2024-11-13 12:52:52,786 - Netbox-Zabbix-sync - DEBUG - Host TEST: hostname in-sync. 2024-11-13 12:52:52,786 - Netbox-Zabbix-sync - DEBUG - Host TEST: template Windows Server by Zabbix agent is present in Zabbix. 2024-11-13 12:52:52,786 - Netbox-Zabbix-sync - DEBUG - Host TEST: template(s) in-sync. 2024-11-13 12:52:52,786 - Netbox-Zabbix-sync - DEBUG - Host TEST: hostgroup in-sync. 2024-11-13 12:52:52,786 - Netbox-Zabbix-sync - DEBUG - Host TEST: status in-sync. 2024-11-13 12:52:52,786 - Netbox-Zabbix-sync - DEBUG - Host TEST: proxy in-sync. 2024-11-13 12:52:52,786 - Netbox-Zabbix-sync - DEBUG - Host TEST: inventory_mode in-sync. 2024-11-13 12:52:52,786 - Netbox-Zabbix-sync - DEBUG - Host TEST: interface in-sync.
For the moment, the only problem I get is concerning the filtering part. By default, all my VMs are syncing although my filter being : nb_device_filter = {"tag": "zabbix"}. It works pretty well on devices but not on VMs. Is it a configuration problem on my side ?
Hey @LeoDef, you are totally right about the filtering! I've added a seperate variable for filtering in the config.py.example and synced the code to the virtual_machine development branch. You can pull the latest version of the code and add the variable to the config.py file yourself or copy the example file. Anyways setting the variable to the following value will result in VM's only being synced with the Zabbix tag :+1:
nb_vm_filter = {"tag": "zabbix"}
I'll wait a couple of days for final code testing, i think that the code by now is pretty solid and ready for production. Most of the bugs have been squashed.
Code has been implemented in production. Feel free to pull the lastest container release and sync some VM's! Make sure to read the latest README and documentation too see what has been changed! I'll close this issue in a couple of days if everything keeps working as expected :)
Tested the prod code and all is good. Looks like you were able to tweak the logging on VM syncing so my total sync time went from 1.5 mins down to about 20 seconds for everything.
The only minor issue I had to correct was nb_vm_filter missing from my config. Once I added and set it, the sync was great.
Awesome work!
Hello, awesome continued work! I just tested your latest code and it worked flawlessly :). One minor thing is that templates only accepts an array like this:
"templates": [
"Linux by Zabbix agent"
]
It does not accept a regular string like this:
"templates": "Linux by Zabbix agent"
Error:
Netbox-Zabbix-sync - WARNING - Unable to find template L for host <hostname> in Zabbix. Skipping host...
Maybe this is intended, but anyway great work!
Its this mainly do to the way Netbox integrates multiple keys together through multiple contexts.
@Albert-LGTM Just like Kage said, this is for now intended behaviour. However i might be able to do some trickery here and to check what the object type is. If its a string then treat it as a single template. And a list object could have 1 or more templates.
I'll close this feature request for now since the code has been implemented in production. Should any new bugs be presented with the device or VM sync then please create a new issue for this.
I hear more and more people asking if this script can sync VM's.
Short answer for now: no.
The nr.1 thing that has been holding me back is the implementation of the hostgroup format for VM's. I have been thinking about this limitation since attributes such as manufacturer and site etc. are not always relevant or present for a VM.
However there could be other pieces of data such as cluster name for the VM which the user could use for the hostname generation.