Add reliable identification of unique client systems

Sebastian-Roth commented 7 years ago

So far FOG used to identify clients via MAC address. Though MACs are pretty much unique all over the world we see more and more people using shared USB NICs to clone systems. This needs addressing.

First we added System UUID as an identifier but those turned out to be pretty much unreliable.

Sebastian-Roth commented 7 years ago

Working towards a way to identify unique client systems reliably we need to combine the information we have. Though I don't think it's wise to add another field to the DB. Better if we just add the logic to the PHP code to use different identifiers - one after the other in case we still get too many results from the DB.

Sebastian-Roth commented 7 years ago

Nevertheless IMHO this means that we have to give up on MAC address being a unique identifier in the DB - which will be a major change. Therefore I started a new branch called unique-identifier to work on this subject.

astrugatch commented 7 years ago

I am running into this exact issue. My work around has been to issue usb NICs to individual machines, but this is just that, a work around. If you need testers for this feature please let me know.

Sebastian-Roth commented 7 years ago

We had a long and intensive discussion on this topic and thought that essentially the hard disk is what makes a client a client from the FOG point of view. So I came up with some code to read HDD serial and WWN from the disk firmware. Adding that code to iPXE turned out to be a dead end as there is no disk driver/subsystem in iPXE yet and it would have been a huge task.

I then decided to go back to what we have, UUID, System Serial Number and Mainboard Serial Number. This should be good enough to uniquely identify most machines. Anyhow it's heaps better than what we have right now.

One big step is done. Added most of the code to identify hosts into the FOG PHP code and the FOS scripts. So this will be ready to test fairly soon.

Sebastian-Roth commented 6 years ago

People might be able to change their system UUID in some cases: https://forums.mydigitallife.net/threads/bios-tools.529/page-19 https://kb.stonegroup.co.uk/how-to-change-the-dmi-information-a-stone-asus-based-system_357.html https://kb.stonegroup.co.uk/how-to-change-the-dmi-information-on-a-stone-intel-nuc-system_412.html

Sebastian-Roth commented 6 years ago

We need to keep in mind that the MAC address is also used in other places in FOG, e.g. in /images/dev/<MAC_WITHOUT_COLON> - code ref1. Not sure if we need to change that as well. There should not be a conflict as uploads are not as frequent and parallel as deploys are.

astrugatch commented 6 years ago

I have a shipment of machines that just came in with 3 spares so I can do some testing. I can spin up a new FOG instance with another branch if needed.

danielbmiller commented 5 years ago

A couple of things I have come across inventorying machines where I work (small public university, ~3000 units to date, hodgepodge of manufacturers and system configurations):

Serial numbers are not universally unique Prime examples of this are unset serial number fields (as discussed in the forums) but I have seen small companies that use literal serial numbers to identify their machines.
Serial numbers are not necessarily unique within a company I have run across companies that reuse/recycle serial numbers (Dell and IBM / Lenovo come to mind). Instances have been few and far between and it has been quite a while since I have encountered one, but it is a thing.

Because of these issues, I have had to get things down to a model + serial number arrangement to (mostly) assure a unique identity. For those that don't have programmatically accessible serial numbers I have resorted to manual entry, and for those that just don't have a serial number period, I have been known to create an arbitrary serial numbers and grab a sharpie. Thinking about things, it would probably make sense to start using asset tags for the latter.

With this in mind along with the other issues brought up in the forums, it seems like using all the available information for the static components (what defines a host) might be useful. With this many values in play, programmatically choosing one is an option, or could take them all at once. I'm not very familiar with iPXE, but if a cryptographic hash function is available to the init environment, running a string of those values through that would capture the uniqueness without needing to decide which identifier to use or having to keep track of multiple values. Wouldn't be without it's downsides (troubleshooting and reliably reproducing the identifier in the client come to mind), but could potentially result in more robust identifiers.

Just another spitball to consider.

astrugatch commented 5 years ago

I think danielbmiller is on the right track. A hash of a 2 or more readable fields may be the best move. This way you can get away with one being the same across models without getting the same identifier. MAC address should still NOT be one of the pieces used since this is the item that is no longer consistent on units that ship without ethernet ports.

junkhacker commented 5 years ago

this has been discussed and won't work as well as it looks like on the surface. i've seen plenty of systems were the only information that's available in the PXE environment, aside from MAC address, is identical (or blank) for all systems of that make and model. so the hashes would match multiple machines.

astrugatch commented 5 years ago

What information is available in the PXE environment?

Do we have good examples of what other management/ imaging systems use as identifiers?

danielbmiller commented 5 years ago

More spitballing.... (and yes I am as much trying to get the information from posts spread out across the forums here as I am trying to spark discussion)

I think http://ipxe.org/cfg has a list of default attributes. Looking through my last Ghost server (v2.5, some machines I have can't PXE), looks like they use mac addresses; I seem to remember Ghost getting quite irritated when I had machines with multiple NICs.

If the data isn't accessible at the iPXE stage, can the non-unique identifier be classed as problematic to be handled as an edge case?
For those identified edge cases, can the data available to iPXE be changed so that the uniqueness issue goes away, possibly via asset tag? I believe the devs behind iPXE say it can't be done permanently from within iPXE, but A) could FOS perform the changes (with smbios-utils or something similar,) B) is that something that FOS should do, and C) would this approach even work for the kinds of devices that have ambiguous identifiers?

Could the TPM be leveraged to either locate, generate, or store an identifier? theoretically, there is a unique-ish RSA key that could be used to encrypt a known string... But then again, are the problematic models likely to have a TPM and could that TPM be interacted with during a PXE session?

astrugatch commented 5 years ago

I suppose the bigger discussion needs to be about what is the acceptable edge case. Currently systems without built in Ethernet are considered edge cases by FOG, and that was fine pre ultra books and high speed wireless becoming the norm. My question would be if UUID could be made to work (I know @Sebastian-Roth mentioned issues with that code in this thread) are the manufacturers that don’t adhere to unique UUIDs the new acceptable edge case? And a sub question would be who isn’t adhering to that? If say Dell, HP, Lenovo, Acer and Asus all follow a standard, but beige box or Aliexpress machines don’t I would personally see that as acceptable.

Sebastian-Roth commented 5 years ago

@astrugatch The most valuable information on this you can find in the forum: https://forums.fogproject.org/topic/10987/what-can-we-do-when-we-don-t-trust-uuid

astrugatch commented 5 years ago

@Sebastian-Roth

I just looked over that post. I did not realize how all over the map manufacturers are. I come from a mostly Apple environment so standardized inventory isn't really an issue.

Based on what I see there Motherboard serial + HDD Serial seems like the most non-network base combination.

I know it was mentioned in that thread that HDD can be replaced / change, but the computer could be reenrolled at that point and added to the appropriate group (assuming persistent groups are used) and they are back in business. I feel like systematically thats easier than hunting down and making sure dongles remain paired to computers or constantly shifting mac addresses around during imaging.

junkhacker commented 5 years ago

HDD Serial is available within FOS, but not iPXE

George1422 commented 5 years ago

(man 2017 that was almost 2 years ago now)

When I was working on this, the only solution I found was to extend iPXE by adding a few smbios commands that is needed to make a unique composite device identifier. Where I ended up is using a composite ID of manufacturer, product, serial, asset tag, uuid, and mac address and passing them from iPXE to the FOG php code where it would concatenate those fields into a string and then take the md5 of the string to give us a system unique identifier.

The problem I ran into is that

Not all vendors put values in the same place, i.e. serial number can be on the system board or on the chassis.
iPXE doesn't return all SMBIOS values we need today so it will need to be extended.
I am not good a C program coding. I can read it and hack it, but not create.
Ideally iPXE could generate this system unique uuid because it has access to all of the SMBIOS values we need directly and could create the hash and send it out as one value that the fog php code pick up and use. The issue is I'm not a C programmer. Looking at the code it could be done, but again I'm not a C programmer.

danielbmiller commented 5 years ago

Looks like Tivoli Provisioning Manager uses a four part key consisting of IP address, MAC Address, UUID, and Serial Number with the requirement that at least one value be non-null.

Was googling through some things and it looks like smbios may be generally exposed in iPXE, just not with nice names all around. see: https://github.com/ipxe/ipxe/pull/21#issuecomment-45793439 http://etherboot.org/wiki/commandline

Of note is ${smbios/x.y.z} form for arbitrary access. Assuming this is route is viable, the bear would be going through and determining what x.y.z is for the attributes in which we would be interested from the DMTF standard ( https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.1.1.pdf ).

George1422 commented 5 years ago

FOG uses dmidecode linux application to access the values in SMBIOS. I was able to mimic the code and add two of the needed smbios variables into iPXE, but then started thinking to use iPXE to do the combination of variables to make a composite sysuuid value and got in over my head. The iPXE code does contain the crypto code needed to create the md5 hash as far as I could tell. So all of the bits needed to create the sysuuid value seem to be in there the just need to be glued together.

Sebastian-Roth commented 5 years ago

@ghooper90 said:

So all of the bits needed to create the sysuuid value seem to be in there the just need to be glued together.

While you are mostly right on this (SMBIOS info available but md5/sha/base64 not available as simple string manipulation commands I think) I still feel that there is no consensus about which details to use.

I would consider the MAC address still being part of it as it is one very reliable detail on probably 90% of the machines. But @astrugatch argues against this for a good reason as well.

HDD serial is pretty much out I reckon! And I say this for a good reason. I have looked into adding HDD serial to the iPXE code. I started off by implementing it in C code to run on Linux and it worked great. Then I moved on to see how to access HDD information in the low level environment iPXE is running on early boot. I even talked to the iPXE devs but there is not much interest in this because they don't consider local disk access (of any form) valuable on PXE boot. I don't remember the details but I think NVMe/AHCI/RAID and such things would make it very hard to come up with reliable code anyway.

danielbmiller commented 5 years ago

Assuming there isn't a desire to collect information beyond that which FOG is currently storing for inventory purposes, unioning the attrubute sets from https://forums.fogproject.org/topic/10987/what-can-we-do-when-we-don-t-trust-uuid and excluding the sections that i would expect to be (relatively) easy to change or not natively accessible to PXE (Hard disk, Memory, Bios, and I seem to remember some discussion of CPU replacements as well) would leave the attributes related to System, Motherboard, and Chassis attributes.

System Manufacturer
System Product
System Version
System UUID
System Serial Number
System Type
Motherboard Manufacturer
Motherboard Product Name
Motherboard Version
Motherboard Serial Number Motherboard Asset Tag Chassis Manufacturer
Chassis Version
Chassis Serial
Chassis Asset

Of those attributes, I believe the Serial Numbers (system, mb, and chassis), UUID, and Asset Tags (mb and chassis) would successfully identify almost all of the machines I have deployed (some notable exceptions are those with replaced mobos that didn't have their SNs re-entered by whomever serviced the machine). It also appears that set of attributes would be sufficient for uniquely identifying most of the examples given in the UUID thread, but it is hard to tell with the redactions. Having the asset tags in the mix would provide an means to force some kind of uniqueness should all else fail. I would like to see some sort of product identification in there as well, but with the lack of consistency of those identifiers, I don't know if I could convince myself that adding in 5 or more fields to that initial query would be worth guarding against a very small set of cases.

Another idea ... how accessible are the non-active mac addresses from iPXE? Possibly not very ( http://forum.ipxe.org/showthread.php?tid=6886&pid=9054 ) but maybe in future?

Other ideas?

Then I guess another issue to address is how to both identify and handle a machine isn't uniquely identified by the data available.

mastacontrola commented 5 years ago

My worry is that there's so many "options" but none of them seem to be standardized. MAC Address, in my opinion, is by far the most "stable" method when comparing what we're looking at.

Even HDD's sometimes don't have elements set appropriately.

That being said, I do agree that we need a better method of identifying hosts than using MAC Address with so many machines coming without NICs on board these days.

The best option I can think of is to use the HDD serial. Ultimately, the image is placed on the HDD. The HDD resides in the host. If you have to replace the HDD, You have to replace the image on the machine to begin with. Could we focus on just the HDD elements? If we need to, create a GUID generator based on the HDD's serial number. I really think this would be the best approach, though I understand iPXE not being able to get on board with it. Maybe we can fork it and come up with something ourselves for this?

Just spitting this out there.

astrugatch commented 5 years ago

@mastacontrola

I agree with this thought process. I think the harddrive is the focal point of FOG and from the outputs in the UUID thread the drive serials, while possibly not perfect, to appear to be more standardized and accurate.

BUT

As @Sebastian-Roth said:

HDD serial is pretty much out I reckon! And I say this for a good reason. I have looked into adding HDD serial to the iPXE code. I started off by implementing it in C code to run on Linux and it worked great. Then I moved on to see how to access HDD information in the low level environment iPXE is running on early boot. I even talked to the iPXE devs but there is not much interest in this because they don't consider local disk access (of any form) valuable on PXE boot. I don't remember the details but I think NVMe/AHCI/RAID and such things would make it very hard to come up with reliable code anyway.

So I guess conceptually HDD Serial seems like the logical solution, but there may be rather large practical/technical hurdles for you Devs.

Quazz commented 5 years ago

Can we guarantee that HDD serials are going to be unique, though?

The problem will ultimately always be: Did the manufacturer follow the rules neatly or did they ignore them blazanly?

In the latter situation, we will always run into problems, regardless of the identifier we use.

MAC was useful since they HAD to be unique, it was the only it could function, after all.

But none of this info technically has to be unique for utilitarian reasons, so we can expect ourselves to run into trouble at one point or other.

Don't mean to be negative, just would hate to see us go down a long difficult path full of potential bugs and end up being forced to conclude that it was no better than UUID or similar identifiers.

mastacontrola commented 5 years ago

I'm fairly certain the HDD serial will always be unique as well. Getting access to the hdd, on the other hand, will be a whole project in and of itself.

danielbmiller commented 5 years ago

I think even HDD serials may be problematic; Virtualization platforms may not provide those, at least the VMWare example in the UUID thread didn't.

mastacontrola commented 5 years ago

I am fairly sure VM's generate random serial numbers for the HDD as well. To be fair, i haven't looked at the fog inventory as we do try to get the inventory item for it, but I do remember seeing Serials.

To test: Zen, KVM, VirtualBox, HyperV. I Usually work with VMware.

George1422 commented 5 years ago

I really think leaving the HD serial numbers out of the composite calculation is best. As Sebastian said nvme don't all seem to have serial numbers to reference. When I get the chance I'll sort out what fields would be best to use. As mentioned above I did want to include both the board and chassis asset tags to the FOG Admin can at the very least break the tie if 2 system's sysids are calculate the same.

Sebastian-Roth commented 5 years ago

As Sebastian said nvme don't all seem to have serial numbers to reference.

Although this is not what I meant/said I still think that HD serial will cause us (mainly me, I suppose) hell of a lot of work that we/I better put into other things. One issue I see about NVMe is that accessing HD serial will work different than on classic HDs/SSDs. I don't even want to think about coming up with C code in iPXE to support all those different disk models (meaning SATA HD vs. PCIe NVMe!!)

As well HD serials in VMs is highly likely to not be unique. A quick search foo on the web revealed this: https://github.com/rockstor/rockstor-doc/blob/master/quickstart.rst

All drives must have unique serial numbers (real drives do); not all VM [*] systems default to this.

On the other hand question is: Are the other options set properly for VMs? I just checked for VirtualBox and System UUID seems fine. I just updated George's initial post. While for VMware UUID seems not available but System Serial Number seems unique.

Sebastian-Roth commented 5 years ago

@ghooper90 said:

As mentioned above I did want to include both the board and chassis asset tags to the FOG Admin can at the very least break the tie if 2 system's sysids are calculate the same.

Definitely a good idea. I think we all agree to that!

danielbmiller commented 5 years ago

Would it also make sense to look into providing the capability to write asset tags for that purpose, either in FOS or some other solution?

detay321 commented 3 years ago

I'm dealing with this issue right now. I'm assuming this feature has been integrated into the regular stable branch. If it has, how do I enable or change what the unique identifier?

Sebastian-Roth commented 3 years ago

@detay321 Finding a reliable unique system identifier turned out to be very hard and we never go to terms what to use and implement that.

You might want to explain what issue you are dealing with in more detail and most likely we can help you solve it another way.

ahardylaniertech commented 3 years ago

I've just now found this thread (lots of great info) and would love to assist in any way I can as this would be extraordinarily useful to my org. Would it be an option to allow the admin to pick the unique system identifier options during the install? In my case, all of my machines have a unique chassis serial number. This would easily fix the issue for me. However, at least one older model that I have does not have HDD serials (why?!?!). Perhaps allow admins to pick the combined fields too? I'm assuming this could all be handled within the php code at the time of registration (just combine vars) and within the client as well?

Obviously this would be a temporary fix, but imaging PCs without physical nics is becoming SO common that this would be a nice workaround. I'll dig into the php code and see what I can find. We have PLENTY of test machines of varying models and makers.

Sebastian-Roth commented 3 years ago

@ahardylaniertech You are more than welcome to dig into this. I might give you a few pointers: https://github.com/FOGProject/fos/search?q=uuid https://github.com/FOGProject/fos/blob/master/Buildroot/board/FOG/FOS/rootfs_overlay/bin/fog.checkin#L96 ... https://github.com/FOGProject/fogproject/search?q=sysuuid https://github.com/FOGProject/fogproject/blob/dev-branch/packages/web/lib/fog/fogbase.class.php#L526

Many years back when we tried to introduce this it turned out the sysuuid was not unique at all. Back then the code caused a lot of trouble for some people using the latest developer version and we jumped back but never got the act together to bring it back in and properly test on enough systems to make sure it does not fail us.

One very important point is that we can only use what is available in iPXE (as extending the iPXE code would be a huge effort on top and using a different network boot loader is even more scary to introduce now). Take a look at the iPXE config menu on your machines to see what's available. To get into the iPXE config menu you can create a custom iPXE entry and call config from there.

ahardylaniertech commented 3 years ago

Awesome. Thanks for the pointers. I've got a steep learning curve in figuring out exactly how FOG code works, but I'll give it a shot. I've pulled a copy of the code and likely will have several questions as I dig in. Thanks for being so responsive already.

ahardylaniertech commented 3 years ago

Wow. You really pointed me at the exact spot, ty! What do you think about giving the admin the option to choose? By default the system uses the MAC, but the admin could choose to change (at any point in time) to using a different method. Only machines added from that point forward would be affected. There could be a few options (MAC, UUID, chassis serial, a combination of those, or manual input during full registration. It shouldn't require any changes to ipxe or any of that code, just the PHP side of things and perhaps the FOG client.

I'm not clear on how the client identifies itself to the system . . I would assume MAC, right?, but does it submit a combo of vars like the ipxe client during registration (so we could handle on the php server side) or, I would expect more likely, a code change so it can determine the value it submits.

Interested in your feedback.. Ty!

Edit: Well I guess during ipxe boot we'd have to have it do the calc of what it's sending I suppose as the php side could only handle that at registration, right? During check-in for tasks I'm guessing the ipxe client doesn't send all of the info for php to process, correct?

astrugatch commented 3 years ago

I like the idea @ahardylaniertech presents with choosing. For us UUID or system serial is perfect because we stick with Dell and they adhere to those standards as far as I can tell. But YMMV with other vendors, but having the fall back on MAC is a solid option.

Sebastian-Roth commented 3 years ago

@ahardylaniertech said:

What do you think about giving the admin the option to choose?

On the one hand side it would be good so people can adjust things to their needs. On the other hand it will make the coding more complex as well as prone to errors (e.g. unexpected stuff you can't know about beforehand). Also people might get the setting wrong causing trouble in their environment.

By default the system uses the MAC, but the admin could choose to change (at any point in time) to using a different method.

I like the idea of leaving MAC as default to not effect all users straight away with such a fundamental change.

Only machines added from that point forward would be affected.

I don't think this is going to work out. I wouldn't want to keep track of which method to use for each individual machine/host. Makes the logic even more complex.

There could be a few options (MAC, UUID, chassis serial, a combination of those, or manual input during full registration)

While the first three seem fine from my point of view "manual input" is not going to work because we can't burn that information into a ROM in the machine/host for next time it boots up again. We need to use the information provided by each individual host.

It shouldn't require any changes to ipxe or any of that code, just the PHP side of things and perhaps the FOG client.

Haven't spent too much time thinking this through yet. As I said, I wouldn't touch the iPXE code itself. PHP server code sure and also the FOS init scripts need adjustment (see links above). Not sure about the fog-client code.

I'm not clear on how the client identifies itself to the system . . I would assume MAC, right?

The host/machine booting through iPXE sends a couple of vars/values (MAC, sysuuid, maybe more) but we only ever look at the MAC address on the server side (PHP code).

but does it submit a combo of vars like the ipxe client during registration (so we could handle on the php server side) or

Not sure I understand this. If set to boot from network (PXE) a host would send "a combo of vars" (see my comment above) on each and every boot. Sure not as many vars/values as we send on registration (like disk size and so on) but definitely more than just the MAC. I think this is an important part, so please pay attention: We need to use what's available early in the very first stage of PXE booting because at that point we need to distinguish between hosts. We can't boot up to the point where the Linux kernel (bzImage) is loaded because decision making needs to happen before that (show menu, send to imaging task, ...).

To make a long story short: We need to identify hosts/machines at iPXE boot stage. Doesn't mean we need to dig into the iPXE code but we just have to use what's available at this stage of booting. Take a look at the iPXE config command as well as iPXE cfg vars/values.

FOGProject / fogproject

Add reliable identification of unique client systems #198