AndreBL / ip6neigh

Giving local DNS names to IPv6 SLAAC addresses [OpenWrt/LEDE shell script]
GNU General Public License v2.0
116 stars 15 forks source link

awkNR: cmd. line:1: Unexpected token #16

Closed DanaGoyette closed 1 year ago

DanaGoyette commented 3 years ago

After ip6neigh has been running for a while, I start getting it erroring out when I try to list the hosts: awkNR: cmd. line:1: Unexpected token

Trying to debug it with sh -x, I'm not familiar enough with awk to know what it's trying to do.

root@OpenWrt:~# sh -x /usr/bin/ip6neigh list
+ . /lib/functions.sh
+ N='
'
+ _C=0
+ NO_EXPORT=1
+ LOAD_STATE=1
+ LIST_SEP=' '
+ reset_cb
+ '[' -z  ]
+ '[' -f /lib/config/uci.sh ]
+ . /lib/config/uci.sh
+ CONFIG_APPEND=
+ . /lib/functions/network.sh
+ readonly 'CMD_TOOL_VERSION=1.7.2'
+ readonly 'HOSTS_FILE=/tmp/hosts/ip6neigh'
+ readonly 'CACHE_FILE=/tmp/ip6neigh.cache'
+ readonly 'SERVICE_NAME=ip6neigh-svc.sh'
+ readonly 'SBIN_DIR=/usr/sbin/'
+ readonly 'SHARE_DIR=/usr/share/ip6neigh/'
+ readonly 'SERVICE_SCRIPT=/usr/sbin/ip6neigh-svc.sh'
+ readonly 'OUI_FILE=/usr/share/ip6neigh/oui.gz'
+ '[' list '=' --version ]
+ . /usr/lib/ip6neigh/ip6addr_functions.sh
+ IP6ADDR_LIB_VERSION=1.7.1
+ '[' list '=' --version ]
+ CMD=/usr/bin/ip6neigh
+ list_hosts  
+ check_running
+ is_running
+ pgrep -f ip6neigh-svc.sh
+ return 0
+ return 0
+ check_files
+ '[' -f /tmp/hosts/ip6neigh ]
+ '[' -f /tmp/ip6neigh.cache ]
+ return 0
+ grep -n '^#Discovered' /tmp/hosts/ip6neigh
+ cut -d : -f1
+ local 'ln='
+ echo '#Predefined hosts'
#Predefined hosts
+ awk 'NR>1&&NR<(-1) {printf "%-30s %s\n",$2,$1}' /tmp/hosts/ip6neigh
+ sort
+ echo -e '\n#Discovered hosts'

#Discovered hosts
+ awk 'NR> {printf "%-30s %s\n",$2,$1}' /tmp/hosts/ip6neigh
+ sort
awkNR: cmd. line:1: Unexpected token
DanaGoyette commented 3 years ago

I should note that the /tmp/hosts/ip6neigh file does not have either comment in it, it's just a list of IP addresses. If I manually add both headers, the ip6neigh list command unbreaks, but it seems to have a bunch of duplicate names, most of which are .TEMP.GUA (I changed TMP to TEMP in the config).

BigThunderSR commented 1 year ago

Just started using this and am seeing the same issue as well.

cvmiller commented 1 year ago

Hi All, can you supply the contents of you /tmp/hosts/ip6neigh file?

The awkNR is a line number function, where it is trying to act on the upper or the lower part of the file. It appears that for some reason it is not finding the separation point (between upper and lower).

BigThunderSR commented 1 year ago

Hi All, can you supply the contents of you /tmp/hosts/ip6neigh file?

The awkNR is a line number function, where it is trying to act on the upper or the lower part of the file. It appears that for some reason it is not finding the separation point (between upper and lower).

Here's what it looks like when the service first starts up:

#Predefined SLAAC addresses
fe80::6238:
fde0:f45c:
2600:xxxx:

#Discovered IPv6 neighbors
fe80::fb44:
fe80::f0af:
fe80::ef19:
fe80::d2de:
fe80::bc95:
fe80::8216:
fe80::683b:
fe80::6647:
fe80::60e0:
fe80::2e0:
fe80::2a6d:
fe80::22a1:
fe80::20c:
fe80::10d5:
fde0:f45c:
fde0:f45c:

Here's what it looks like when the service has been running for a short while (few minutes):

2600:xxxx:
fe80::60e0:
fde0:f45c:
2600:xxxx:
fde0:f45c:
fe80::20c:
fe80::6647:
fe80::fb44:
fde0:f45c:
2600:xxxx:

The entries stop being separated after a while and everything seems to get mashed into one continuous file.

cvmiller commented 1 year ago

Thanks, it is the continuous file that is causing the awk symbol problem. That will take a little more digging to understand why that is happening. I don't see this happening on my 23.05 router, but I suspect it is because I only have a few hosts on that router.

Have your truncated the file? the ULAs and GUAs are not complete addresses (no double colons)

Can you tell me which version of OpenWrt you are using. TIA

BigThunderSR commented 1 year ago

Well, I rebooted the router and the problem seems to have gone away for now. 🤷🏽‍♂️

I'll report back if it recurs.

Yes, I truncated the file since I didn't want to paste it verbatim.

BigThunderSR commented 1 year ago

Unfortunately, it's back again.

cvmiller commented 1 year ago

OK, no problem with you truncating the addresses, now that I know the SW isn't doing it. I note that you have shown 14 Discovered neighbours in the working file, and only 9 in the second file. Do you have more than 14 hosts on the network? I am trying to see how many hosts I need to recreate this problem.

BTW, what version of OpenWrt are you using?

BigThunderSR commented 1 year ago

Sorry for any confusion on the example file output. I was just trying to show what was happening in the formatting of the file, so the before and after aren't perfect representations of my system per se.

After the issue came back following the reboot per my previous response., I restarted the service a few moments before your latest response to coincidentally see if I can find the point at which the issue recurs. I'm currently at 116 entries and the issue hasn't come back yet. I'll provide an update if/when it recurs. From what I have noticed so far, it appears that the issue seems to occur when the file is rapidly changing (adding/removing entries at a high rate) vs. the total number of records in the file.

I'm running a snapshot release based on 19.07 with dnsmasq 2.85.

cvmiller commented 1 year ago

Because of the trickiness of this problem, I think I am going to have to see a log. Please enable logging in the /etc/config/ip6neigh file:

        # Enable logging of debug messages. Options are '0' for disabled, '1' to use
        # syslog or 'file name' if you want to log to a file instead.   
        Default: '1'                       
        option log              '/tmp/log/ip6neigh.log'

And then send in the log after the event happens. This log is likely to be large, so don't leave it running for hours this way.

BigThunderSR commented 1 year ago

Unfortunately, the problem is back and the list is down to ~70 entries.

Is there an easy way to anonymize the entries in the log file? I don't want to post it with all my IP and host information. Thanks.

BigThunderSR commented 1 year ago

I think it happens when this condition is hit due to discrepancies in the static IP lease vs. what the host responds with as its hostname vs. what may be listed in a user provided hosts file.

I see this:

Sun Oct 22 20:09:35 CDT 2023 Unknown host MYHOST1
MYHOST2 now has got a proper name. Replacing all entries.
Sun Oct 22 20:09:35 CDT 2023 Renamed host: MYHOST1
MYHOST2 to MYHOST2

<Snip>

Sun Oct 22 20:09:55 CDT 2023 Unknown host MYHOST1
MYHOST2 now has got a proper name. Replacing all entries.
Sun Oct 22 20:09:55 CDT 2023 Renamed host: MYHOST1
MYHOST2 to MYHOST2

I've managed to get it to trigger multiple times now by releasing/renewing the IPV6 address on the MYHOST1 (same as MYHOST2) machine..

cvmiller commented 1 year ago

Just to be clear, you have predefined hosts in /etc/config/dhcp? And in that definition, they have pre-defined DHCPv6 hostnames as well?

cvmiller commented 1 year ago

Unfortunately, the problem is back and the list is down to ~70 entries.

Is there an easy way to anonymize the entries in the log file? I don't want to post it with all my IP and host information. Thanks.

Since the log file is a text log file, after transferring the file to a laptop, you can easily do a search and replace your GUAs from something like 2001:db8:... to xxxx:yyyy:.... But I'll need enough of the address to understand what is being swapped in and out (deleted/added).

BigThunderSR commented 1 year ago

I found several ways to trigger the phenomenon and it all seems to point to the replace/rename portion of the code as being the smoking gun. Whenever the rename occurs for whatever reason, it seems to mangle the /tmp/hosts/ip6neigh file which is what triggers the awk failure reported in this issue.

I see this same pattern in the log file similar to that from yesterday regardless of which host caused the replace/rename to trigger:

Sun Oct 22 20:09:35 CDT 2023 Unknown host MYHOST1 MYHOST2 now has got a proper name. Replacing all entries. Sun Oct 22 20:09:35 CDT 2023 Renamed host: MYHOST1 MYHOST2 to MYHOST2

Sun Oct 22 20:09:55 CDT 2023 Unknown host MYHOST1 MYHOST2 now has got a proper name. **Replacing all entries.** Sun Oct 22 20:09:55 CDT 2023 **Renamed host**: MYHOST1 MYHOST2 to MYHOST2 Would you be able to figure out why the replace/rename portion of the code is mangling the /tmp/hosts/ip6neigh file regardless of which host or why that code path got triggered? Unfortunately, there's way too much PII in the log, so it's not feasible for me to sanitize that before uploading. Thanks for your help!
cvmiller commented 1 year ago

Without the log it is difficult to troubleshoot the problem.

I am wondering if you don't have a duplicate (or multiple duplicates) MAC address(es) on your network. I don't see why hostnames would change, and they even need to invoke the "renaming" function of ip6neigh.

Looking at the code (I am not the original author), the entries are not kept in a database, but rather just text files. In order to delete (or rename) a host, a temporary file must be created (using grep -v) and then the temp file is mv to the original file. This takes time.

If there is a duplicate MAC address on your network, then 2 hosts will fight over the MAC address, which would cause the need for the hosts to be renamed, and renamed, and then renamed.

It is easy to create duplicate MAC addresses if you are cloning VMs, for example.

You may want to check your ARP table on the router with the command: arp -n | sort -k 4

And then look for duplicate MACs

BigThunderSR commented 1 year ago

Yes, I agree that there are naming conflicts (but no MAC conflicts) within my network that is causing the rename to occur, but the rename shouldn't mangle the /tmp/hosts/ip6neigh file when the rename does need to occur due to any reason.

Also, the /tmp/hosts/ip6neigh file gets mangled if the rename occurs even with only 3 - 4 items in the list vs. hundreds.

Here's one way (out of many) that I managed to trigger the rename:

Hope this helps to try and replicate the triggering of the rename sequence for troubleshooting the rename portion of the code.

Thanks!

cvmiller commented 1 year ago

Visual inspection of the code reveals no issues. But now that I know that your host is sending HostnameA and you have in your /etc/config/dhcp a different name (e.g. hostnameB), let me set that up on a router and see if I can trigger the problem.

cvmiller commented 1 year ago

Another thought: If you remove (for testing only) the static DHCP config which changes the name, does the problem go away?

BigThunderSR commented 1 year ago

Another thought: If you remove (for testing only) the static DHCP config which changes the name, does the problem go away?

Yes, just removing the DUID entry is sufficient to prevent the phenomenon from triggering while keeping the custom hostname as-is for IPV4.

Thanks!

cvmiller commented 1 year ago

I have tried to re-create your issue, unfortunately, without success. But since you are running snapshot, it is not possible for me to run the exact setup you have. I am running:

PRETTY_NAME="OpenWrt 22.03.0-rc4"
dnsmasq - 2.86-13

I have setup a static name/ip/duid as you suggest running improved "rename host name" and I am not seeing the issue, it appears to be working as expected. The new code eliminates the need to create a temp file (since it is using sed edit in place option).

I recommend making a backup of your existing ip6neigh-svc.sh (in /usr/sbin) before putting this version onto the router (in the same location). Be sure to remove the '.txt' (github doesn't allow attaching '.sh' files)

ip6neigh-svc.sh.txt

BigThunderSR commented 1 year ago

I will test this over the weekend and get back to you. Thanks much!

BigThunderSR commented 1 year ago

@cvmiller, the updated version has been working great so far and thanks much!

cvmiller commented 1 year ago

Good to hear, my test bed also ran all weekend with out issues. I am going to mark this as fixed (v 1.7.2 for ip6neigh-svc.sh), and please feel free to reopen, should the problem re-arrise.