OpenBeta / open-tacos

Rock climbing route catalog (openbeta.io)
https://openbeta.io
GNU Affero General Public License v3.0
140 stars 127 forks source link

Add Analytics #257

Open vnugent opened 2 years ago

vnugent commented 2 years ago

Evaluate a privacy-focused lib such as:

CocoisBuggy commented 1 year ago

It may help if we list out some specific metrics that we are after.

If I were to guess, we are probably after:

  1. Site traffic. This is the bread and butter of any good analytics and is supported by any good analytics system
  2. Traffic sources. We are probably not that interested in answering questions like "What search engine did this user get referred by", because the likelihood that we will need to optimize ad campaigns or get extremely granular with SEO are low. We probably are interested in geographical statistics, as the nature of our work is very much based on serving a global community.
  3. Privacy and trust. This goes without saying, but we don't care very much about tracking our users - if we did, we would use google analytics.
  4. Very low effort. With limited development resources, a system that provides little friction is probably best for us.
  5. We do care about answering questions like "What devices are visitors using", but these questions should be fairly easy to guess at in any event, so it's not that important.

The point of all of this, really, is that we are concerned with user usage statistics, but are not particularly interested in the finer details related to advertising.

With that in mind, plausible seems more in line with our objectives than fathom. Here is my reasoning:

Plausible is Open Source, while fathom seems to not have self-hosting or source code available. The devs over at Elementary OS had some good insight. They mention Plausible's absence of advertiser-centric features, and also make a point of talking about ease of adoption

Google Analytics is by far the most popular analytics solution available and there are few privacy-respecting alternatives. We considered a few self-hosted products like Matomo, but we couldn’t find a good fit in terms of maintenance, pricing, and features. We also investigated using self-described privacy-focused Fathom, but were disappointed to learn that they had gone closed-source since we’d last looked at them. The situation changed once we stumbled upon Plausible.

For me, a big plus is the privacy of this system in general. As mentioned here, the data captured for analysis is quite sanitary and useful in wide contexts.

All told, I would imagine that Plausible is probably better for us (Though, it is a few dollars more expensive - trending toward ~$5 more per price break)

@vnugent , do you think there are any other factors to consider?

vnugent commented 1 year ago

@CocoisBuggy thanks for the detailed evaluation. https://posthog.com/ was suggested in the duplicate issue (#857). Any thoughts on that?

CocoisBuggy commented 1 year ago

Posthog looks super powerful. The tracking can be considerably more involved. The price scaling is a little bit terrifying in my opinion... but maybe it's reasonable since anything under 1m events per month is free...

Even with the 50% off they offer to non-profits, posthog is quite a bit more expensive than plausible once we start reaching into the 2m events territory. "Events" in this case may also amount to more than simply the number of page views, so we have an opportunity there to sandbag ourselves by capturing too much data.

image

That said, there is more inside posthog than plausible. The session replay seems particularly useful, especially when it's used to record some error case. It strikes me that we could use error boundaries to record fatal errors in the field - BUT there is a developement overhead there. I also like the user paths that they demo in their 20min pitch

image

vnugent commented 1 year ago

As much as I'd like to use Plausible, $9/month for a basic use case, tracking page view, is a deal breaker at the moment. @CocoisBuggy what do you think if we go with Posthog free tier?

CocoisBuggy commented 1 year ago

There are other options for keeping price down if we understand our requirements properly. I have always found ingesting access logs from a web-server like NGINX to be more than sufficient for working out the answers to specific questions provided that you know them.

My only concern with posthog is the way the price scales once we break out of the free tier (Though the 50% off for non profits will help us in this respect). What if we self-host plausible? It Looks to have a dockerfile (source code) that should be swappable in the future if we try to go to the cloud. My anticipation is that the docker image would be an extremely small load, given that it's essentially a log file and a small elixir server.

Concerns aside, I imagine they are future concerns. We would probably be well served by posthog if we need it up and running right away.

p.s: I can do some actual bench-marking on plausible to see how heavy it is, should we need it

vnugent commented 1 year ago

@CocoisBuggy Let's go ahead with the Dockerizing Plausible. We do want to track events since some important interactions, the main search box for example, are client-side which I believe will not be easily trackable in the web logs. We do want to track that search box since it's one of the key metrics to present to partners/sponsors.

CocoisBuggy commented 1 year ago

image

very easy to spin up, but the overall service is definitely a bit more extensive than I initially thought in terms of how many docker services it's asking for (4)

Basic events are actually supported, which I didn't spot before. It may not be sufficiently customizable for our purposes, though.

vnugent commented 1 year ago

@CocoisBuggy can you do a quick prototype to see if we can track when the search box is open? We may need to increase our Kubernetes compute power to accommodate the stack (additional $6-12/month). IMO at this point we'd be better served using their hosted service.

CocoisBuggy commented 1 year ago

Not for when the searchbox is open, but could definitely track when users open the search / click on the search. one way it to do it with the library and capture custom properties

// plausible api
trackEvent('download', { 
    props: { 
        query_entered: true,
        interval: 16000, // (ms) 
    } 
})

Looks to be more or less the same for posthog with their capture api, where you can dispatch one-shot events (Though, the payload seems totally open)

// posthog api
posthog.capture('searchbox-interval', {
    query_entered: true,
    interval: 16000, // (ms)
})

As an aside, posthog also has autocapture. This means interactions with ALL structured html elements will be captured - though this strikes me as one way to run out the free tier pretty quick.

One concern i have is that plausible just doesn't seem to have as much tooling for processing custom events as posthog does, so if we want to go this route and we're not psyched on self-hosting then posthog seems like the way.

vnugent commented 1 year ago

Not for when the searchbox is open, but could definitely track when users open the search / click on the search.

I didn't phrase it well. Yea, that's what we want

One concern i have is that plausible just doesn't seem to have as much tooling for processing custom events as posthog does, so if we want to go this route and we're not psyched on self-hosting then posthog seems like the way.

I think you can track goals and conversions? https://plausible.io/docs/goal-conversions

vnugent commented 1 year ago

Another challenge with doing our own hosting is that servers have to be in the EU. It will be costly for us to do that because the existing Kubernetes cluster is in US. We'll have to get a separate EU-based VM.

https://matomo.org/blog/2020/07/storing-data-on-us-cloud-servers-dont-comply-with-gdpr/

vnugent commented 1 year ago

@CocoisBuggy Plausible has a 7-day trial. I can set it up an account for you to try it out whenever you're ready.