[Multiple] Invalid DHCP configuration creates IP address conflicts

nccerhostmaster commented 2 years ago

Platform

VMware Marketplace

bndiagnostic ID know more about bndiagnostic ID

0500883d-4e1b-7670-9f86-26a68eb2fa89

bndiagnostic output

[Apache]

A high number of incoming requests originate from one or more unique IP addresses. This could indicate a bot attack. The following guide shows how to check for and block suspicious IP addresses.

https://docs.bitnami.com/bch/apps/moodle/troubleshooting/deny-connections- bots-apache/

[Resources] Press [Enter] to continue:

Your instance has little available RAM memory.

 total used free shared buff/cache available Mem: 473 321 12 2 139 136 Swap: 634
18 615

You could try to increase your instance's memory. Please check your cloud provider's documentation for more information.

[Connectivity]

Server ports 22, 80 and/or 443 are not publicly accessible. Please check the following guide to open server ports for remote access:

https://docs.bitnami.com/general/faq/administration/use-firewall/

bndiagnostic was not useful. Could you please tell us why?

None of the issues it found are significant or relevant to DHCP

Describe your issue as much as you can

Starting two separate VMWare instances based on Joomla or Resourcespace images, they obtain a DHCP address using a bogus MAC address with 36 hex digits.

Both machines get the same IP due to using the same bogus MAC address.

Using dhclient I can release and renew and get a different address based on a valid MAC. However on reboot the machine reverts to the invalid mac address.

I have only encountered this problem with Bitnami images, it does not happen with Debian or other distros

gongomgra commented 2 years ago

Hi @nccerhostmaster,

Thanks for using Bitnami. Can you check if you reproduce the error by following our guide on how to launch a cloud image in VMware Marketplace?

https://docs.bitnami.com/vmware-marketplace/get-started-vmware-marketplace/

nccerhostmaster commented 2 years ago

I don't understand this recommendation. How could launching a cloud image in AWS be relevant to my issue with DHCP on my local network?

nccerhostmaster commented 2 years ago

Issue is with a LOCAL hypervisor, using a LAN with DHCP. I can configure static IP addresses, but why should I have to do that? Shouldn't DHCP work as expected? Is DHCP not supported?

nccerhostmaster commented 2 years ago

I can reproduce this with multiple different Bitnami VMs. Each one I deploy has the exact same client ID, resulting in being issued the same IP address. Steps to reproduce:

Deploy any of several Bitnami Debian-based OVAs on a VMWare hypervisor
Start the VM
It gets the same IP address as all the other Bitnami VMs deployed the same way
All VMs have the same client ID in DHCP.

nccerhostmaster commented 2 years ago

Google searches show that in the past there were multiple separate discussions about this issue on the old community site. The fact that all of that content was destroyed is a real shame, as I can see there were probably solutions in the past. Whoever decided to destroy all that content without providing any sort of archive or migration is a real jerk.

recena commented 2 years ago

@nccerhostmaster Thank you so much for your contribution and participation.

According to the information provided, your scenario does not seem to be related to the VM itself but to how you are managing your local environment. Our suggestion is to go through the idea of generating a new MAC address during the deployment.

There are a lot of references on the internet to solve this scenario.

nccerhostmaster commented 2 years ago

Wow, this new support platform is really awful.

swinster commented 2 years ago

I would like to reopen this case as I have seen the same issue and I believe that it IS related to the VM appliance image.

When you create multiple VMs from the same OVA, VMware has the sense to alter the MAC address assigned to the vNIC. However, in the DHCP Discover packet from the VM, the VM sends a client identifier (which, as far as I understand, is optional) in the DHCP Discover message, This client identifier is the same for the VMs created from the same OVA (in my case a Debian based LAMP appliance).

DHCP can use this client UID to allocate DHCP addresses, thus you can get the same IP address being issued to different VMs. It is possible to configure some DHCP to ignore this client UID, but this breaks DHCP protocol guidelines. I guess as it is an optional thing in the first place, perhaps it would be best if Bitnami configured the appliance to not send the client identifier in the first place, or certainly make it unique from appliance to appliance.

Below we see a couple of screen grabs of the same PCAP when we see two VMs boot one after another and get issued the same IP address. Note the differing VMware MAC addresses but the same client UID:

swinster commented 2 years ago

Check out https://www.rfc-editor.org/rfc/rfc2131#section-2, where it says:

The 'client identifier' chosen by a
DHCP client MUST be unique to that client within the subnet to which
the client is attached. If the client uses a 'client identifier' in
one message, it MUST use that same identifier in all subsequent
messages, to ensure that all servers correctly identify the client.

and https://www.rfc-editor.org/rfc/rfc2131#section-4.2:

A DHCP server needs to use some unique identifier to associate a
client with its lease. The client MAY choose to explicitly provide
the identifier through the 'client identifier' option. If the client
supplies a 'client identifier', the client MUST use the same 'client
identifier' in all subsequent messages, and the server MUST use that
identifier to identify the client.

So, it is entirely optional for a client to send the client identifier, but f they do, the server MUST use it to identify the client.

That paragraph goes on to say:

If the client does not provide a
   'client identifier' option, the server MUST use the contents of the
   'chaddr' field to identify the client. It is crucial for a DHCP
   client to use an identifier unique within the subnet to which the
   client is attached in the 'client identifier' option.  Use of
   'chaddr' as the client's unique identifier may cause unexpected
   results, as that identifier may be associated with a hardware
   interface that could be moved to a new client.  Some sites may choose
   to use a manufacturer's serial number as the 'client identifier', to
   avoid unexpected changes in a clients network address due to transfer
   of hardware interfaces among computers.  Sites may also choose to use
   a DNS name as the 'client identifier', causing address leases to be
   associated with the DNS name rather than a specific hardware box.

Soooooo, I think you guys need to think about how these appliances are going to be deployed.

nccerhostmaster commented 2 years ago

I knew I wasn't doing anything wrong...

Changing the MAC address of the VM is completely irrelevant, and does nothing. Thanks for this research, it's proof that the issue is with the VM image, and not with me being a moron who doesn't know how to deploy a VM properly.

swinster commented 2 years ago

np. TBH, I have no idea that this client identifier (or UID) played such an important role in DHCP address allocation. My assumption was, until a day ago. was that this was all based around hardware MAC addresses. Just shows we should never stop learning :).

How Bitnami resolves this issue, I do not know. Perhaps they remove the Client UID (as mentioned above) given it is optional, or perhaps they have some script that runs on the first boot that makes this unique. I'm not entirely sure how other appliance manufacturers tackle this problem, however, I do work for a software company that produces VM appliances which are loosely based on Debian, and on test deploying our OVAs, our OS guys manage to reset the UID so that it is unique. I have asked a question as to what they do, but obviously, this is really for Bitnami to figure out.

recena commented 2 years ago

Thank you @swinster for your research. We will take a look and get back to you with some feedback.

swinster commented 2 years ago

@recena , I have now had feedback from our OS guys. Looks like you have been a bit naughty a left /etc/machine-id baked into the OVA image. Appareantly, this is a BIG no-no, following on from embedding SSH keys.

javsalgar commented 2 years ago

Hi,

Thank you so much for the information! I've been doing some experiments on my side.

My setup was VMware Fusion 12.2.4 in my Intel Mac OS. I used the latest LAMP OVA available in the Bitnami site. I imported the OVA twice so I have two VMs available for testing.

I performed my experiments with different network settings:

Test 1: Using VMware Fusion internal network

With these settings, I was able to reproduce the issue, obtaining the same IP address despite having different MAC addresses (as reported before)

I was able to consistently reproduce the issue every time I launched the two VMs at the same time. However, when launching the instances with a time window of several minutes, the instances received different IP addresses in a consistent manner:

I'm not an expert, but it looked like a race condition to me. I tried performing the following changes with no luck:

Commenting out the send hostname section in /etc/dhcp/dhclient.conf and restarting. I could see that it was the only send section in dhclient.conf that was uncommented so I thought it could explain the issue. However, after restart that did not work.

Removing the /etc/machine-id file in both machines and restart (as reported by @swinster) . That didn't help either.

Test 2: Use default network settings

I also tried with the following network settings:

In this case, no matter how many times I tried, I always get different IP addresses:

IMPORTANT: Switching back to the previous network setting caused the "race condition" error to appear again.

Having got to this point, it would be very helpful for us if you could share more details of your testing platforms, as I'm not sure if we are all using the same DHCP server (which then, it may be interesting checking what's triggering the issue). Any other information would be appreciated to dive more into the cause of the issue.

swinster commented 2 years ago

@recena I can assure you it's not a race condition :)

You use systemd (certainly in the Debian image I am using) for networking (networkd), and this is extracted from the man page:

       The following options are understood:

       DUIDType=
           Specifies how the DUID should be generated. See RFC 3315[1] for a description of all the options.

           The following values are understood:

           vendor
               If "DUIDType=vendor", then the DUID value will be generated using "43793" as the vendor identifier (systemd) and hashed contents of machine-id(5). This is the default if
               DUIDType= is not specified.

           uuid
               If "DUIDType=uuid", and DUIDRawData= is not set, then the product UUID is used as a DUID value. If a system does not have valid product UUID, then an application-specific
               machine-id(5) is used as a DUID value. About the application-specific machine ID, see sd_id128_get_machine_app_specific(3).

           link-layer-time[:TIME], link-layer
               If "link-layer-time" or "link-layer" is specified, then the MAC address of the interface is used as a DUID value. The value "link-layer-time" can take additional time
               value after a colon, e.g.  "link-layer-time:2018-01-23 12:34:56 UTC". The default time value is "2000-01-01 00:00:00 UTC".

So, so the thing sent on the wire is the result of DUIDType=vendor (which is the default configuration of systemd-networkd)

If you look at the certainly my OVA, and I guess @nccerhostmaster's , in the /etc/machine-id file, you may find a static configuration. In my case, this is:

fd261bde44c9477bba0882b166c9e795

This then propagates to the actual VM and thus all VM created from that image will have the same client identifier. This is explicitly seen in the PCAP image above, so is indisputable!

As mentioned, leaving this file bake into the image that you then publish as an OVA is a BIIIIG NO-NO. The file needs to be deleted so what you create a VM from the image, a new and unique machine identifier is created.

swinster commented 2 years ago

FYI, I use pfSense as a DHCP server. There is an option of ignoring client identifiers, however (as stated above) this breaks the RFC that governs DHCP (RFC 2131). However, it is NOT the DHCP server you should be concentrating on here. It is what the VMs put onto the wire, as explicitly demonstrated in the above PCAP screenshots. Using the same UID to identify different VMs is a MEGA NO-NO. It contravenes this specific RFC, and I suspect could have multiple other implications.

It is what is known in the trade as a "slam dunk", At least I believe that is what the kids might say.

swinster commented 2 years ago

The /etc/machine-id file needs to be removed from the base IMAGE before the packaging as an OVA. We script this removal during the image build process. It needs to exist, but be empty. I'm not sure if manipulating this file after boot is a thing.

swinster commented 2 years ago

ok, it looks as if removing the content of the /etc/machine-id and leaving it as a blank file of a live VM, then rebooting of the machine, does indeed generate a new machine ID and thus a new client identifier in the DCHP discover packet. This then leads to proper DHCP server operation.

I can't edit the /etc/machine-id within the vmdk in the OVA as this is likely a stream-optimised compressed image and is read-only. YOU guys need to do this in the build process.

marcosbc commented 2 years ago

Hi @swinster, just wanted to let you know that we have fixed this issue in the build process for our OVAs. We are already working on releasing all affected VMs, and expect most of them to be released throughout the weekend. Please note that it may take a bit longer for it to appear updated in the VMware Marketplace, since there is a specific review process for that.

In addition, for any user that wants to get this issue fixed in existing images, it is as simple as executing these commands:

sudo rm /etc/machine-id
sudo touch /etc/machine-id
sudo reboot

nccerhostmaster commented 2 years ago

@swinster thanks for your efforts, much appreciated. I was able to fix my issues by clearing the machine-id.

swinster commented 2 years ago

Awesome, glad to be of help. TBH, I don't know too much about Linux OS or VM packaging an appliance, but I know some very clever people that do. Troubleshooting is more my field. Hopefully next time, you will dive a bit deeper and look at PCAPs and consult some RFCs - which is what your support should be able to do, but it is great that you have acted upon this swiftly once the problem was understood.

On another note, those very clever people I work with suggested that another file be removed, but that doesn't exist within your image - namely /var/lib/dbus/machine-id. It would appear that you don't use DBus in your images, which raised some eyebrows with those clever people previously mentioned. Whilst this is not marked as required or important, they suggested that modern Linux builds probably should use this for managing inter-process communication. Just thought I'd let you know.

recena commented 2 years ago

which is what your support should be able to do,

It is what we have doing for years for free.

bitnami / vms