Thomas--F / BotTracker

BotTracker-Plugin for Piwik
GNU General Public License v3.0
33 stars 14 forks source link

No bots found when using import_logs.py #19

Closed godsyn closed 9 years ago

godsyn commented 9 years ago

I run the following hourly without issue.

/usr/bin/python2.7 /var/www/path/to/piwik/misc/log-analytics/import_logs.py --url=https://piwik-install-url.com/ --enable-reverse-dns --enable-static --enable-bots --enable-http-redirects --recorders=4 /path/to/log-access.log --idsite=1
/usr/bin/python2.7 /var/www/path/to/piwik/misc/log-analytics/import_logs.py --url=https://piwik-install-url.com/ --enable-reverse-dns --enable-static --enable-bots --enable-http-redirects --recorders=4 /path/to/log-access3.log --idsite=3
/usr/bin/python2.7 /var/www/path/to/piwik/misc/log-analytics/import_logs.py --url=https://piwik-install-url.com/ --enable-reverse-dns --enable-static --enable-bots --enable-http-redirects --recorders=4 /path/to/log-access4.log --idsite=4
/usr/bin/python2.7 /var/www/path/to/piwik/misc/log-analytics/import_logs.py --url=https://piwik-install-url.com/ --enable-reverse-dns --enable-static --enable-bots --enable-http-redirects --recorders=4 /path/to/log-access5.log --idsite=5
/usr/bin/python2.7 /var/www/path/to/piwik/misc/log-analytics/import_logs.py --url=https://piwik-install-url.com/ --enable-reverse-dns --enable-static --enable-bots --enable-http-redirects --recorders=4 /path/to/log-access6.log --idsite=6
/usr/bin/python2.7 /var/www/path/to/piwik/misc/log-analytics/import_logs.py --url=https://piwik-install-url.com/ --enable-reverse-dns --enable-static --enable-bots --enable-http-redirects --recorders=4 /path/to/log-access7.log --idsite=7
/usr/bin/python2.7 /var/www/path/to/piwik/misc/log-analytics/import_logs.py --url=https://piwik-install-url.com/ --enable-reverse-dns --enable-static --enable-bots --enable-http-redirects --recorders=4 /path/to/log-access8.log --idsite=8
/usr/bin/php5 /var/www/path/to/piwik/console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --accept-invalid-ssl-certificate --url='https://piwik-install-url.com/'

Nothing is ever placed into the piwik_bot_db. The plugin is installed/activated, and all sites have the default list of bots active. BotTracker seems to never process imported data.

I've enabled the logging in ./plugins/BotTracker/BotTracker.php

......
        public function logToFile($msg)
        {
                $logActive = true;
                if ($logActive){
......

And this appears in ./tmp/logs/log.txt

......
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
[2015/04/25 23:21:34] SiteID: user Agent: Piwik/LogImport TS:2015-04-25 23:21:34 page:
......

Suggestions?

Thomas--F commented 9 years ago

When you import the log-records, every entry is send by the same UserAgent "Piwik/LogImport". The plugin tries to find a bot by scanning the UserAgent for specific keywords. As you can see, the plugin cannot work when using the log-import, because the real UserAgent is not comming in. I'm sorry, but with the current version of Piwik, the plugin can only be used with the PHP-API during the webpage-hit.

godsyn commented 9 years ago

Can you think of an API call other than Tracker.isExcludedVisit that would be further down the chain of execution? The reason I ask is this is doable via custom variables as seen here: Image showing bots

Thomas--F commented 9 years ago

Which API-Call do you think of?

I use "Tracker.isExcludedVisit" because that's the reason I wrote this plugin: to exclude Bots from the visitor-list.

Do you just want to mark the visits with an custom variable? Then they will stay in the list and blurr the results of the "real" visitors.

godsyn commented 9 years ago

Understood. As it stands there appears to be no way to use this plugin on large scale sites (using imports / crons). With a custom variable filter I was able to remove the bots from the main stats. I'll live with that. Thank you for your time.