dlenski / vpn-slice

vpnc-script replacement for easy and secure split-tunnel VPN setup
GNU General Public License v3.0
726 stars 87 forks source link

Nasty crash when there are non-UTF-8 characters in /etc/hosts #153

Closed bortzmeyer closed 4 weeks ago

bortzmeyer commented 1 month ago

If the comments in /etc/hosts are in my native language, vpn-slice crashes with a scary message:

  File "/usr/local/bin/vpn-slice", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vpn_slice/__main__.py", line 625, in main
    do_post_connect(env, args)
  File "/usr/local/lib/python3.12/dist-packages/vpn_slice/__main__.py", line 290, in do_post_connect
    providers.hosts.write_hosts(host_map, args.name)
  File "/usr/local/lib/python3.12/dist-packages/vpn_slice/posix.py", line 79, in write_hosts
    lines = hostf.readlines()
            ^^^^^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 775: invalid continuation byte

After finding (perl -ne 'print if /[^[:ascii:]]/' hosts ) and deleting them, it works. Ideally, I would like the file to be processed even if there are non-ASCII characters. Otherwise, a better error message would be nice.

dlenski commented 1 month ago

If the comments in /etc/hosts are in my native language, vpn-slice crashes with a scary message:

What language and what encoding is this? (Si je dois deviner, j'imagine que c'est du français representé en codage iso8559-1, d'où 'é'.encode('iso-8859-1').decode('utf-8')UnicodeDecodeError)

As the error message states, it's not complaining about non-ASCII bytes, it's complaining about non-UTF8 bytes.

dlenski commented 1 month ago

Ideally, I would like the file to be processed even if there are non-~ASCII~ UTF-8 characters. Otherwise, a better error message would be nice.

Want to write up a PR to process the /etc/hosts file as a binary file rather than a text file with an implied/assumed encoding? https://github.com/dlenski/vpn-slice/blob/master/vpn_slice/posix.py#L94-L116

gmacon commented 1 month ago

I'm kinda surprised that /etc/hosts works with IDNs... I would have expected the A-label (punycode) representation to be required.

bortzmeyer commented 4 weeks ago

I'm kinda surprised that /etc/hosts works with IDNs... I would have expected the A-label (punycode) representation to be required.

Indeed, but the composed characters were in the comments.

bortzmeyer commented 4 weeks ago

If the comments in /etc/hosts are in my native language, vpn-slice crashes with a scary message:

What language and what encoding is this? (Si je dois deviner, j'imagine que c'est du français representé en codage iso8559-1, d'où 'é'.encode('iso-8859-1').decode('utf-8')UnicodeDecodeError)

As the error message states, it's not complaining about non-ASCII bytes, it's complaining about non-UTF8 bytes.

You're right, if I use UTF-8 (like everyone should do), it works.

dlenski commented 4 weeks ago

I'm kinda surprised that /etc/hosts works with IDNs... I would have expected the A-label (punycode) representation to be required.

Indeed, but the composed characters were in the comments.

Thanks, good clarifying question by @gmacon! This makes sense.

https://github.com/dlenski/vpn-slice/issues/153#issuecomment-2325271112 should indeed be the right solution. Does this patch do the trick to prevent crashes even if you have non-UTF-8 characters in comments, @bortzmeyer?

diff --git a/vpn_slice/posix.py b/vpn_slice/posix.py
index ca267cd..531e42c 100644
--- a/vpn_slice/posix.py
+++ b/vpn_slice/posix.py
@@ -99,14 +99,13 @@ class HostsFileProvider(HostsProvider):

     def write_hosts(self, host_map, name):
         tag = f'vpn-slice-{name} AUTOCREATED'
-        with open(self.path, 'r+') as hostf:
+        with open(self.path, 'r+b') as hostf:
             fcntl.flock(hostf, fcntl.LOCK_EX)  # POSIX only, obviously
             lines = hostf.readlines()
-            keeplines = [l for l in lines if not l.endswith(f'# {tag}\n')]
+            keeplines = [l for l in lines if not l.endswith(f'# {tag}\n'.encode())]
             hostf.seek(0, 0)
             hostf.writelines(keeplines)
-            for ip, names in host_map:
-                print(f"{ip} {' '.join(names)}\t\t# {tag}", file=hostf)
+            hostf.writelines(f"{ip} {' '.join(names)}\t\t# {tag}\n".encode() for ip, names in host_map)
             hostf.truncate()
         return len(host_map) or len(lines) - len(keeplines)
bortzmeyer commented 4 weeks ago

#153 (comment) should indeed be the right solution. Does this patch do the trick to prevent crashes even if you have non-UTF-8 characters in comments, @bortzmeyer?

Yes, perfect. Thanks.