allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.38k stars 1.11k forks source link

GDPR compliance and anonymize-ip #2282

Closed ghost closed 2 years ago

ghost commented 2 years ago

Hi,

Thanks for developing and maintaining goaccess, we are using it since about two years already. It has changed a lot since then, we’re glad to see that it’s still evolving!

A word about the --anonymize-ip feature. It currently operates by hiding the last 8 bits for IPv4 addresses, and the last 64 bits for IPv6 addresses.

The French CNIL, the administration enforcing GDPR in France, recommends[1] [2] to hide the last 16 bits of IPv4 addresses to ensure GDPR compliance.

Regarding IPv6 addresses, no official directive has been issued to my knowledge, though Google Analytics hides the 80 last bits. Contributors from the Dutch Internet Standards Platform suggested that even 80 bits would not be enough, because ISPs usually gives /48 to customers ; 96 bits may be a good compromise. Anyway, 64 bits is just insufficient, it is very likely that the « anonymized » IPv6 will still uniquely identify a client (see matomo-org/matomo#18301).

Here’s a table to sum up:

Bits hidden goaccess Google Analytics CNIL recommends DISP suggests
IPv4 8 8 16 16
IPv6 64 80 ??? 96

Is it possible to implement a feature to customize the number of anonymized bits? Or to add a parameter to enforce stronger anonymization levels?

Since two years we apply sed -r -i 's/(((1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.){2})1?([0-9][0-9]?|2[0-4][0-9]|25[0-5])\.0/\10.0/g' on the html file to further anonymize IPv4 addresses, but now we accept IPv6 traffic, maybe it’d be simpler to directly implement this feature in GoAccess. :smile:

Cheers!

Neil

allinurl commented 2 years ago

Thanks for sharing that feedback and glad to hear you guys have been using it for some time.

It sounds like it's totally doable, like you said, I think the best is to have an option where you can specify the level, e.g., 1, 2, 3. It shouldn't be too bad, mostly playing with the mask on this piece of code. I can take a look at it or if you are familiar with C and up for the challenge, please feel free to submit a PR :)

ghost commented 2 years ago

The PR proposed by @pitilux has been tested in our production environment with goaccess’ master branch, and works like a charm.

Thanks to both of you! Closing the issue.

allinurl commented 2 years ago

Thank you for the PR again and for posting an update. Happy to hear it's working fine :)