Closed bortzmeyer closed 4 weeks ago
If the comments in
/etc/hosts
are in my native language, vpn-slice crashes with a scary message:
What language and what encoding is this? (Si je dois deviner, j'imagine que c'est du français representé en codage iso8559-1, d'où 'é'.encode('iso-8859-1').decode('utf-8')
→ UnicodeDecodeError
)
As the error message states, it's not complaining about non-ASCII bytes, it's complaining about non-UTF8 bytes.
Ideally, I would like the file to be processed even if there are non-~ASCII~ UTF-8 characters. Otherwise, a better error message would be nice.
Want to write up a PR to process the /etc/hosts
file as a binary file rather than a text file with an implied/assumed encoding? https://github.com/dlenski/vpn-slice/blob/master/vpn_slice/posix.py#L94-L116
I'm kinda surprised that /etc/hosts
works with IDNs... I would have expected the A-label (punycode) representation to be required.
I'm kinda surprised that
/etc/hosts
works with IDNs... I would have expected the A-label (punycode) representation to be required.
Indeed, but the composed characters were in the comments.
If the comments in
/etc/hosts
are in my native language, vpn-slice crashes with a scary message:What language and what encoding is this? (Si je dois deviner, j'imagine que c'est du français representé en codage iso8559-1, d'où
'é'.encode('iso-8859-1').decode('utf-8')
→UnicodeDecodeError
)As the error message states, it's not complaining about non-ASCII bytes, it's complaining about non-UTF8 bytes.
You're right, if I use UTF-8 (like everyone should do), it works.
I'm kinda surprised that /etc/hosts works with IDNs... I would have expected the A-label (punycode) representation to be required.
Indeed, but the composed characters were in the comments.
Thanks, good clarifying question by @gmacon! This makes sense.
https://github.com/dlenski/vpn-slice/issues/153#issuecomment-2325271112 should indeed be the right solution. Does this patch do the trick to prevent crashes even if you have non-UTF-8 characters in comments, @bortzmeyer?
diff --git a/vpn_slice/posix.py b/vpn_slice/posix.py
index ca267cd..531e42c 100644
--- a/vpn_slice/posix.py
+++ b/vpn_slice/posix.py
@@ -99,14 +99,13 @@ class HostsFileProvider(HostsProvider):
def write_hosts(self, host_map, name):
tag = f'vpn-slice-{name} AUTOCREATED'
- with open(self.path, 'r+') as hostf:
+ with open(self.path, 'r+b') as hostf:
fcntl.flock(hostf, fcntl.LOCK_EX) # POSIX only, obviously
lines = hostf.readlines()
- keeplines = [l for l in lines if not l.endswith(f'# {tag}\n')]
+ keeplines = [l for l in lines if not l.endswith(f'# {tag}\n'.encode())]
hostf.seek(0, 0)
hostf.writelines(keeplines)
- for ip, names in host_map:
- print(f"{ip} {' '.join(names)}\t\t# {tag}", file=hostf)
+ hostf.writelines(f"{ip} {' '.join(names)}\t\t# {tag}\n".encode() for ip, names in host_map)
hostf.truncate()
return len(host_map) or len(lines) - len(keeplines)
#153 (comment) should indeed be the right solution. Does this patch do the trick to prevent crashes even if you have non-UTF-8 characters in comments, @bortzmeyer?
Yes, perfect. Thanks.
If the comments in
/etc/hosts
are in my native language, vpn-slice crashes with a scary message:After finding (
perl -ne 'print if /[^[:ascii:]]/' hosts
) and deleting them, it works. Ideally, I would like the file to be processed even if there are non-ASCII characters. Otherwise, a better error message would be nice.