arnoson / kirby-stats

Simple and privacy friendly web statistics for Kirby CMS.
MIT License
8 stars 0 forks source link

what about ignoring traffic from bots/cawlers #12

Open bnomei opened 1 year ago

arnoson commented 1 year ago

Most of the bots should be ignored, it uses both Matomo's DeviceDetector and Jaybizzle's CrawlerDetect. I run it on my portfolio site for testing and get some obvious bots never the less. Do you have any idea on how to improve this?

arnoson commented 1 year ago

There is an undocumented arnoson.kirby-stats.debug option which can be enabled and will log the useragent/path to get more information on where the bot detection is failing.

bnomei commented 1 year ago

for my pageview counter plugin i used a tracking pixel below the first render fold. not sure if you wanna go that way.

arnoson commented 1 year ago

Great solution, I will look into this (: I still like the simplicity of just using routes and, at least in my portofilio webiste, I have a lot of sub-pages that don't scroll at all, I have to think about how I could handle this

grommasdietz commented 9 months ago

Similar to the pixel below the first render fold, there is another technique to filter bots by looking for user interaction, e.g. by using a png or svg on body:hover: https://herman.bearblog.dev/how-bear-does-analytics-with-css/

(Still needs additional style element added on each page, though)

arnoson commented 9 months ago

Looks great @grommasdietz! I definitely think it needs some sort of client side js/css logic for bot filtering. Maybe the first step would be to create an tracking endpoint in this plugin to test these methods.

One thing I just realized though, is that ublock origin blocks the tracking endpoint of the bearblog website. Im not sure if this is because it is included in a block-list or because of some rule based and the naming of the endpoint (including hit, ref, ...)

arnoson commented 9 months ago

Just checked and it is because the hit endpoint of bearblog is blocked by https://easylist.to/

arnoson commented 9 months ago

I'm currently experimenting with an api endpoint for tracking and it seems CSS only doesn't work. This is because I don't want to hash/save any IP data and instead use the referrer to determin wether something is a visit or just a view (internal navigation inside the website). With the current route hook approach I can read the referrer, but when using an enpoint I would have to send any information I need. Right now I'm thinking about something like this as a start:

const isReload = performance.navigation.type === 1
if (!isReload) {
  const data = new FormData()
  data.append('path', location.pathname)
  data.append('referrer', document.referrer)
  navigator.sendBeacon('/stats/handle', data)
}

Additionally we could only trigger the endpoint if a certain event happens or after a timeout of say, 5sec. Goatcounters count.js might be a helpful resource.

grommasdietz commented 9 months ago

I’m definitely not into best practices in this topic and don’t have insights as you have: While I prefer a way of handling statistics without additional images and css or js, shouldn’t it still be possible to trigger any php logic by returning the image with a simple root?

'routes' => [
  [
    'pattern' => 'statistics/(:all).svg',
    'action'  => function ($all) {
      $path = $all == '' ? option('home', 'home') : $all;
      $page = page($path);

      if (!$page) {
        return page('error');
      }

      // Handle necessary plugin logic

      $content = '<svg xmlns="http://www.w3.org/2000/svg" width="1" height="1"></svg>';

      return new Response($content, 'image/svg+xml');
    },
  ],
],

The kirby snippet could look like:

<style>
  body:hover {
    border-image: url("/statistics<?= Url::short($page->url()) ?>.svg");
  }
</style>
arnoson commented 9 months ago

The problem with this is that we loose the referrer and therefore can't distinguish between a view and a visit. Most analytic tools I know of use the hashed IP address instead to do this. We could send the referrer with php:

<style>
  body:hover {
    border-image: url("/statistics<?= Url::short($page->url()) ?>/<?= $_SERVER['HTTP_REFERER'] ?>.svg");
  }
</style>

but this won't work with caching. So I guess it is either sending the referrer with js oder use another method to distinguish views/visit. But maybe I'm missing something

grommasdietz commented 9 months ago

Ah okay! So even when adding a random hash on each page load to the border image to avoid caching, the html still gets cached and the image/navigation won’t be recognised?

Just hopped on to the discussion after finding out about the technique used by bearblog. I’m sure you’ll find a good way to improve the plugin logic.

Thanks for your work, looking into the ideas and explaining your considerations!

arnoson commented 9 months ago

Yes, I meant the kirby html cache. If it is enabled the referrer part <?= $_SERVER['HTTP_REFERER'] ?> in my version of your svg example will also be html-cached and therefore a stale referrer will be sent to the route. So yeah, maybe a super simple script is the best option. This would also allow to add some additional logic to filter bots in the future. Thanks for your input and interest in this plugin :) It motivates me to continue the development now that other people want to use it too!

arnoson commented 9 months ago

I released a new version that provides an endpoint (/kirby-stats/hit) and a simple script to call it after user interaction.

Edit: composer/packagist didn't pick up v0.0.7 so I had to release v0.0.8 which are basically the same

grommasdietz commented 9 months ago

Looks promising, just to get it: We have to call the scripts function each time we load a page, like on an ajax request, right? I think the removeEventListeners function has to be slightly corrected:

  const removeEventListeners = () => 
    events.forEach((e) => document.removeEventListener(e, sendStats, eventOptions))
  1. The function used addEventListener, probably a mistake?
  2. Not sure if necessary, but it’s more safe to include the eventOptions on removal as well:

    It's worth noting that some browser releases have been inconsistent on this, and unless you have specific reasons otherwise, it's probably wise to use the same values used for the call to addEventListener() when calling removeEventListener().

    MDN web docs

Created a pull request!