DigitaleGesellschaft / Anonip

Anonip is a tool to anonymize IP-addresses in log-files.
Other
75 stars 17 forks source link

Error with IPv6 short form notation + port #65

Closed TheLalaMan closed 2 years ago

TheLalaMan commented 2 years ago

When processing ErrorLog entries from Apache 2.4 in the following format (IPv6 short form notation + port)

2001:db8:1::ab9:C0A8:102:46824 [Wed Jul 06 21:28:43 2022] [error] [pid 68812] mod_proxy_fcgi.c(887): AH01071: Got error 'Primary script unknown'

I get the following error:

$ echo "2001:db8:1::ab9:C0A8:102:46824" | ./anonip.py
WARNING:__main__:'2a06' does not appear to be an IPv4 or IPv6 network
2a06:6440:0:2c80::1:46824

When I remove the port, it works:

$ echo "2001:db8:1::ab9:C0A8:102" | ./anonip.py
2001:db8::

This occurs with Python 3.8 and 3.9. Can this be fixed?

tolimar commented 2 years ago

I'm not sure, if that is a problem of anonip per se, but probably more of the log file format. See rfcs 2732: Format for Literal IPv6 Addresses in URL's (or the short version in Wikipedia): Too distinguish between IPv6 address and Port in URLs you are supposed to place the IPv6 address in square brackets [].

Anonip could do some guesses of when an string might end in a port and not be a hextet, but given that there is an overlap between valid hextets and portrange, it will never work always. So I think the logformat should be fixed... However, that leads to an other problem:

$ echo "2001:db8:1::ab9:C0A8:102 # this works" | anonip -6 64
2001:db8:1:: # this works
$ echo "[2001:db8:1::ab9:C0A8:102]:8080 # this doesn't" | anonip -6 64
[2001:db8:1::ab9:C0A8:102]:8080 # this doesn't
$ echo "[2001:db8:1::ab9:C0A8:102]:8080 # neither does this" | anonip -6 64
[2001:db8:1::ab9:C0A8:102]:8080 # neither does this

Given that IPv4 addresses with port seem to work with anonip, I kind of expected that IPv6 would work too.

Best solution I could think of for now is to change the log format to comply to rfc 2732, and use the regex option to match IPv6 addresses, like this:

$ echo -e "127.0.0.1:80\n[2001:db8:1::ab9:C0A8:102]:8080" | anonip | anonip --regex '\[((([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])))\].*'
127.0.0.0:80
WARNING:anonip:'1' does not appear to be an IPv4 or IPv6 network
WARNING:anonip:'None' does not appear to be an IPv4 or IPv6 network
[2001:db8::]:8080

Disclaimer: The above regular expression is based upon an answer on stackexchange. I don't claim to understand it, nor have I tested it ;)

open-dynaMIX commented 2 years ago

I absolutely agree with @tolimar s point about rfc 2732 compliance.

But this should work, so it's a bug:

$ echo "[2001:db8:1::ab9:C0A8:102]:8080 # this doesn't" | anonip -6 64

This happened due to the uppercase letters in the IP. I've just pushed a fix to main. So there's no need to use the proposed regex anymore, as long as the IPs comply to rfc 2732.

BTW @tolimar: Thanks a lot for your recent activity in the repo! :rocket:

TheLalaMan commented 2 years ago

Well, I'm glad my issue did lead to an improvement in this repo, albeit in a different area ;-)

Regarding the RFC compliance of IPv6 addresses, I DID try to find where I could adjust the format that I get on the server, but this lead me absolutely nowhere. I just managed to move the IP address into the first column. I would have expected Apache and its modules to be already compliant anyway...

Any pointers on where I can adjust the format of the ErrorLog IP addresses in Apache 2.4 on FreeBSD?

TheLalaMan commented 2 years ago

Just found this, the people of rsyslog where faced with the same problem being reported but were more considerate... ;-) "As this is in the wild and Apache is quite common, it looks like we need to add support for this form of port representation."(https://github.com/rsyslog/rsyslog/issues/4725#issuecomment-959021937).