Geeklog-Core / geeklog

Geeklog - The Secure CMS.
https://www.geeklog.net
25 stars 19 forks source link

Add an option to anonymize IP addresses and APIs to handle them #1090

Closed mystralkk closed 2 years ago

mystralkk commented 3 years ago

Currently, Geeklog records IP addresses in database and log files as they are, but this is not always ideal or suitable in those countries where IP addresses are regarded as private data and thus should not be stored directly. We should add an option to record no IP addresses and/or one to anonymize IP addresses after a given time has passed.

mystralkk commented 3 years ago

Related issues are #438 and #1076.

mystralkk commented 3 years ago

Implemented with change set 58bac940

eSilverStrike commented 2 years ago

Just doing some testing on this and could not get it to work at first.

Looking at the code it looks like that the Geeklog Cron has to be turned on (which is turned off by default). Any reason you have the the IP anonymize happen depending on if the Cron Schedule runs? If so we should probably mention this in the config docs for the ip_anonymize config option.

I also notice it doesn't anonymize the log files (which makes sense). I think we should mention in the config that this only anonymizes IP addresses stored in the Geeklog database (that supports this feature). To remove all IP data then the Geeklog log files and web server log files should be deleted.

On a side note the other issue I had which prevented this from working is the COM_onFrontpage function was always returning false on my WAMP setup (which uses a host file to define the domain for the websites). $_SERVER['PATH_INFO'] doesn't exist so comparing it to $_CONF['site_url'] will always fail. Do you see a fix for this where it can work on a server solution like WAMP and a regular web server like Apache and ISS? I guess we also have to consider that the Geeklog website may not be in the Document Root

mystralkk commented 2 years ago

Looking at the code it looks like that the Geeklog Cron has to be turned on (which is turned off by default). Any reason you have the the IP anonymize happen depending on if the Cron Schedule runs? If so we should probably mention this in the config docs for the ip_anonymize config option.

The reason is that in some cases, you don't want to anonymize IP addresses immediately, e.g., using the SpamX plugin. I assumed the Cron schedule feature is always enabled, which is not the case. As you say, we have to mention this in the config docs.

I also notice it doesn't anonymize the log files (which makes sense). I think we should mention in the config that this only anonymizes IP addresses stored in the Geeklog database (that supports this feature). To remove all IP data then the Geeklog log files and web server log files should be deleted.

As you suggest, we should mention the above said in the config.

On a side note the other issue I had which prevented this from working is the COM_onFrontpage function was always returning false on my WAMP setup (which uses a host file to define the domain for the websites). $_SERVER['PATH_INFO'] doesn't exist so comparing it to $_CONF['site_url'] will always fail. Do you see a fix for this where it can work on a server solution like WAMP and a regular web server like Apache and ISS? I guess we also have to consider that the Geeklog website may not be in the Document Root

Does COM_getCurrentURL() work well on your setup? If so, we should improve COM_onFrontPage() by using the function.

eSilverStrike commented 2 years ago

Makes sense re cron schedule.

I'll take a look and make those other changes mentioned as well.

eSilverStrike commented 2 years ago

Yes COM_getCurrentURL() does work so I will switch COM_onFrontPage to use it.

Found a few more problems though.

Since COM_onFrontPage is called in the root of lib-common for templates and the cron scheduler global variables that hold page numbers and current topic is not set so I will have to remove these checks. (which really are not needed as far as I can see)

Also the COM_onFrontPage only originally checked for // site_url // site_url/index.php

as the home page. Technically all these pages below are the homepage as well

// site_url/index.php?page=1
// * URL Rewrite and URL Routing with "index.php" enabled
// site_url/index.php/topic/-
// site_url/index.php/topic/-/1
// * URL Routing without "index.php" enabled
// site_url/topic/-
// site_url/topic/-/1

I've added these checks in. With URL Routing enabled I had to retrieve the proper name for "topic" variable from the rule record in the database for URL Routing. Basically I look for the record "/index.php?topic=@topic" and then take the rule "/topic/@topic" (or whatever it is) and strip the "/" from the left and the "/@topic" from the right and then use what is left over for the name which identifies a topic url (which also means the homepage if the topic id is "-".