asyncapi / community

AsyncAPI community-related stuff.
https://www.asyncapi.com/community
90 stars 97 forks source link

Measure AsyncAPI Adoption #879

Open fmvilas opened 9 months ago

fmvilas commented 9 months ago

Problem

Since the inception of AsyncAPI, we've been driving the project based on our opinion and perception of the reality. We don't really have visibility of what are the users doing with our tools and with the spec. Therefore, it's always hard to guess if a feature is successful or a complete failure (or somewhere in the middle :P). That's happening for both, the spec and the tools.

Solution

We should start measuring the usage of our tools. It is super important that we don't track any private data (including IPs). Whatever metrics we get, they should be available on our website so anyone can consume them.

The solution should be able to:

  1. Measure unbounded properties: "asyncapi validate has been executed 1623 times this month", "60% of the documents are using version 2.4.0", etc.
  2. Measure user-bounded properties: "From all the users doing asyncapi validate successfully, 40% run asyncapi generate next, 20% run asyncapi validate again, and the rest simply stop there. In other words, a funnel. Anyhow, the user should not be represented by any private data.
  3. Show a prominent notice: It should note we're measuring anonymous usage and offer a simple way to disable it.
  4. Offer a way to disable it: Both, using a command or an environment variable for the CLI, and a configuration option or an environment variable in the case of Studio.
  5. Showcase the metrics publicly in the website.
  6. Measure usage in CLI and Studio.

Rabbit holes

  1. The set of metrics will be hard coded. That means that for new metrics to be rolled out, we need the users to update their clients (especially CLI). That's completely fine. Let's not try to invent a super sophisticated way to auto-update the set of metrics dynamically. At least not in this first iteration.
  2. Let's try to ship something simple as soon as possible and iterate on it. Collecting metrics can be an endless topic.
  3. When evaluating a solution to store the metrics (New Relic, Google Analytics, etc.), make sure you don't spend too much time deciding. It can be an energy drainer.

Scope

Out of bounds

  1. Defining a first set of metrics. I'll define them.

Success criteria

  1. If we start getting usage metrics that's already a success. I'm expecting to get hundreds of hits per day but it's hard to know since we don't have metrics yet 😉
Amzani commented 9 months ago

@fmvilas I suggest to target only the CLI as we might create a library out of this that can be used by other tools.

fmvilas commented 9 months ago

I suggest we target Studio instead since we'll make sure the library is browser-compatible too. I can see three libraries emerging from this work: one that's aware of the filesystem and another that's aware of the browser capabilities. In both cases, they share another one that's in charge of communicating with the metrics endpoint and sending the information in the right way.

derberg commented 9 months ago

I think better would be to convert it into a discussion. since we enabled discussions, issues in community repo usually relate to community repo works

smoya commented 9 months ago

I think better would be to convert it into a discussion.

I agree. The point is that, afaik, Shape It does not have support for GH Discussions. I'm happy to either convert this into a discussion (you @derberg can do that, right?) and create a new issue for Shape It tracking, or rather the opposite.

smoya commented 9 months ago

Regardless of where (which project) we start collecting metrics from, I'm dropping here some caveats and ideas about the feature of showing the metrics publicly on the AsyncAPI website, which is the feature I consider it needs some investigation prework.

1. API rate limits

Regardless of using one service or another to collect metrics, they will have rate limits for queries.
Let's assume that we go with New Relic (where AsyncAPI has a free tier account).

TL;DR:

All of those limitations could be removed if we avoid querying real-time metrics. Instead, to collect those metrics periodically, store them in a cache/DB/filesystem, and make the AsyncAPI website to fetch metrics from there instead of the metrics provider API directly. That implies a "product" decision of not having real-time metrics, but I believe it is completely acceptable. Would someone expect those metrics to be shown in real-time?

The technical details about how to achieve this architectural design are soon to come, but I just wanted to drop this here.

New Relic API rate limits in detail

New Relic has it's own query APIs, that differ from the ingest ones. In fact, it has two: the one they promote, which is NerdGraph (GraphQL), and a REST API.
The last one, as it is kinda deprecated, does not have support for much operations, such as NRQL queries (queries to New Relic backend), meaning it is useless for our use case. So let's focus on NerdGraph API.

The rate limit for NerdGraph is 25 concurrent requests per user. That means no more than 25 requests to that API can be made at the same time. If we query NerdGraph on demand each time that new AsyncAPI website page (the one that will show the metrics) is requested, only 25 concurrent users will be supported, the rest will timeout and won't see metrics. Not a big deal assuming the traffic won't be that high now, but eventually could be, and this is for sure not resilient enough.

Additionally, there is another rate limit in place, and it is the NRQL rate limits. This rate limit is way more complex because it is a combination of:

2. Metric widgets UI component

For this part, I would love to find community members willing to work on the UI part. Any suggestion is more than welcome.

As a side note, New Relic provides a React component that lets you show their metrics by using their widgets. See https://developer.newrelic.com/build-apps/

smoya commented 9 months ago

For illustration purpose, I'm sharing a mermaid chart with the very big picture of the architecture this solution could have. Always assuming we use NewRelic as provider, but could be any other.

---
title: Measure AsyncAPI Adoption - big picture
---
flowchart LR;
subgraph Metrics visualization
    NR[NewRelic]-- metrics --> AsyncAPIWebsite
    AsyncAPIWebsite -- query metrics --> NR[NewRelic]
end
subgraph Metrics collection
    Studio & CLI & Others-- metrics --> NewRelic
end

Considering the API rate limitations any provider will have in place (such as NewRelic, as I wrote in my previous comment, a "cache layer" should be in place. Again, no technical details about implementation (could be a service, a proxy, a serverless function...).

For some reason, GH mermaid lib is falling behind last releases and does not support rich texts. So I'm pasting the image instead:

mermaid-diagram-2023-10-05-231224 Source

Amzani commented 9 months ago

@smoya is newrelic free tier enough to handle all our needs or we need to subscribe to a paid plan ?

smoya commented 9 months ago

@smoya is newrelic free tier enough to handle all our needs or we need to subscribe to a paid plan ?

The only constraint we should be aware of is data retention. In our case, our metrics retention (dimensional metrics/custom events) is 30 days for all raw data points. However, aggregated data retention is 13 months. See https://docs.newrelic.com/docs/data-apis/manage-data/manage-data-retention/#dimensional-metrics

Meaning we are not able to see in deep detail all data points sent > 8 days ago, but we can see 13 months of aggregated (1 min, for example). This is completely fine for us, as we do not really care about such a granularity.

smoya commented 9 months ago

After having a conversation via Slack, we ended up with the conclusion that we could just show New Relic widgets right directly from the public URLs that NR provides for each dashboard widget. That can be done through the UI of a NR dashboard. Each widget gives you a public URL that, when requested, shows the widget like in the following screenshot Google Chrome_Ip2izfNh Embedding those instead of having to query New Relic API for fetching metrics simplifies a lot the architecture: We do not need that intermediate cache layer and we are not affected by the API rate limits.

That means, the big pic would look now like this:

For illustration purpose, I'm sharing a mermaid chart with the very big picture of the architecture this solution could have. Always assuming we use NewRelic as provider, but could be any other.

---
title: Measure AsyncAPI Adoption - big picture
---
flowchart LR;
subgraph Metrics visualization
    NR[NewRelic]-- embeddable widgets --> AsyncAPIWebsite
    AsyncAPIWebsite -- widgets public URL --> NR[NewRelic]
end
subgraph Metrics collection
    Studio & CLI & Others-- metrics --> NewRelic
end
smoya commented 9 months ago

There is one additional concern and it is the fact our clients (Studio, CLI, ...) will be exposing the New Relic API Key (License Key) used for sending metrics. Both in source code (except web apps like Studio), and when executing the requests (by checking network traffic).

This secret leakage could be taken against us if someone wants to use it and send arbitrary data to our New Relic account. I think it is not necessary to go into detail about the possible consequences. See security practices https://docs.newrelic.com/docs/apis/intro-apis/new-relic-api-keys/#security-practices

There is one alternative solution we could implement, but it complicates the design a bit. It is about adding an intermediate service we own that would be in charge of forwarding the metrics to New Relic. This service will be the one holding that API Key and clients (Studio, CLI, etc) will send the metrics to that service instead of to New Relic directly).

Users might still hit that service by re-sending the same requests the client does, and pollute the metrics, but they won't be able to send any other different metrics or any other kind of data or operation over New Relic than the ones we allow on that service. Also, we could easily implement a check on the referer, if present, to allow only Studio domain to execute a request, limiting in that way the possibility of damage to CLI and any other non-web app.

---
title: Measure AsyncAPI Adoption - With metrics forwarder
---
flowchart LR;
subgraph Metrics visualization
    NR[NewRelic]-- embeddable widgets --> AsyncAPIWebsite
    AsyncAPIWebsite -- widgets public URL --> NR[NewRelic]
end
subgraph Metrics collection
    Studio & CLI & Others-- metrics --> MetricsForwarder
    MetricsForwarder -- metrics --> NewRelic
end

We can always go and try, trust in humanity and the fact nobody hates this project, which I'm fine with it :)

fmvilas commented 9 months ago

What about using Google Analytics? Have you considered it? AFAIK, there won't be many issues with rate limits. Also, exposing the token won't be an issue since it's something that also happens in the browser. I mean, not exactly a token but a GA ID. They also have a query param you can use so they don't track IPs (or at least they promise so 😄).

fmvilas commented 9 months ago

Leaving this here for reference: https://docs.newrelic.com/docs/apis/intro-apis/new-relic-api-keys/#key-details. We should have a look at Browser and Mobile App key options. They're essentially the same Google Analytics is providing.

smoya commented 9 months ago

Leaving this here for reference: https://docs.newrelic.com/docs/apis/intro-apis/new-relic-api-keys/#key-details. We should have a look at Browser and Mobile App key options. They're essentially the same Google Analytics is providing.

I already considered using Browser and it's in fact a good solution, even though focused on web apps. The good point of Browser is that you can also, IIRC, limit the referer to a list of known webpages. The cons is that we won't be able to do that for tools like CLI. But anyway, better to show publicly a browser key rather than the license key.

I'm gonna do a quick test and see how it behaves with a non webpage app. Coming back in a few.

smoya commented 9 months ago

Leaving this here for reference: https://docs.newrelic.com/docs/apis/intro-apis/new-relic-api-keys/#key-details. We should have a look at Browser and Mobile App key options. They're essentially the same Google Analytics is providing.

I already considered using Browser and it's in fact a good solution, even though focused on web apps. The good point of Browser is that you can also, IIRC, limit the referer to a list of known webpages. The cons is that we won't be able to do that for tools like CLI. But anyway, better to show publicly a browser key rather than the license key.

I'm gonna do a quick test and see how it behaves with a non webpage app. Coming back in a few.

As expected, Browser won't work with non-website apps. Just taking a look to the snippet you need to use for loading the agent, you can see window object is being used.

We could use Browser for the Studio app in order to collect runtime metrics (performance, load times, etc) but this is out of the scope of this issue.

Browser is discarded. Mobile doesn't make sense IMHO since it's like APM but with another layer on top to unify frontend + backend.

The reality is that the solution in New Relic is to use the metrics API, the one I talked about in my previous comments. We can use GA as an alternative indeed. I'm not very much into sending custom metrics and querying them since last GA version anyway, so if you have a clear path that can save us investigation time, please share.

In fact, GA snippet code requires to be loaded in a website app. Not sure if there is a new alternative to that.

smoya commented 9 months ago

In fact, GA snippet code requires to be loaded in a website app. Not sure if there is a new alternative to that.

With https://developers.google.com/analytics/devguides/collection/protocol/ga4, It is possible to send events from any other source, but the way to do that is mostly the same as with New Relic; to send an HTTP request to a particular endpoint on their side and provide an API Key.

Google Chrome_yuujapWM@2x
fmvilas commented 9 months ago

Just checked how Brew is doing it and got surprised. In the past they were using GA but now they're using https://influxdata.com. Have a look: https://github.com/Homebrew/brew/blob/HEAD/Library/Homebrew/utils/analytics.rb. It may be interesting to consider too.

fmvilas commented 9 months ago

Also, GA has the Measurement Protocol alternative which I don't think needs any secret to be exposed: https://developers.google.com/analytics/devguides/collection/protocol/v1/devguide?hl=en. That said, in some way or another, every service will ask you for a key. It doesn't matter to expose this key publicly if all you can do is send data. This is already possible from the browser console anyway and "nobody" is hacking it.

smoya commented 9 months ago

In the meantime we find the right metrics platform, I created a first POC on the shared library that will record the final metrics. It's functionality is very basic at this point but can help others to collaborate (cc @peter-rr) moving forward and start collaborating.

See it at https://github.com/smoya/asyncapi-adoption-metrics. You can see a usage example in the following test: https://github.com/smoya/asyncapi-adoption-metrics/blob/main/test/recorder.spec.ts

The next steps would be to create all required shortcut methods on that metrics recorder for all actions we think we could record. For example, the recordActionExecution() method is meant for recording CMD executions like validate but also could be an action in the Studio.

As I said, very POC stuff. There are TODOs, like the New Relic sink, which you will find pending work todo, like converting metrics to the New Relic format before sending, etc. Please feel free to ask any question!

Amzani commented 9 months ago

@smoya great POC. Could you elaborate on the anatomy of the actions we record (e.g recordActionExecution()) are theses time series data ? I'm asking just in case we switch the metric platform (for instance to influxdata).

smoya commented 9 months ago

@smoya great POC. Could you elaborate on the anatomy of the actions we record (e.g recordActionExecution()) are theses time series data ? I'm asking just in case we switch the metric platform (for instance to influxdata).

That method its just an example of metrics we might want to collect. I.e. number of validate CLI command calls. Metrics are still TBD, so i did add support only for GAUGE and COUNT timeseries metric types. But easy to add any other since this library is not handling metrics and its behaviour but just collecting and sending to any place via its sinks (i.e. New Relic).

You could tomorrow change from New Relic to any other timeseries provider by just creating a new sink. The rest won't change. I took that design decision based on the current situation where it is not yet clear which provider we will use due to the security concerns when exposing the api keys. In that way, we are not that blocked.

smoya commented 9 months ago

~Something I find challenging is imagining the metrics we want to collect in Studio. CLI is easy; counting each command execution (validate, generate, etc), extract data from documents, etc. However, the Studio is basically a live editor, where things happen in background, like re-parsing and validating, on every time you update the document. We might want to set some kind of limitation on those in order to get metrics that make sense to us. Deeply thought... what metrics do we need to ensure we know how users use the Studio?~

Moved into https://github.com/asyncapi/studio/issues/812

cc @fmvilas @Amzani

smoya commented 9 months ago

/progress 10 made a first POC version of the shared library that tracks metrics.

smoya commented 9 months ago

@fmvilas I have no permission to edit this issue. Would you mind replacing built points "Integration with CLI" and "Integration with Studio" with the following respectively?

Thanks!

Amzani commented 9 months ago

@smoya done.

Amzani commented 8 months ago

Open source alternative to Newrelic : https://signoz.io/

smoya commented 8 months ago

/progress 25 Updated the shared library so it can decorate metadata based on an AsyncAPI document - https://github.com/smoya/asyncapi-adoption-metrics/pull/2

smoya commented 8 months ago

/progress 35 Made a POC for CLI registering one metric https://github.com/asyncapi/cli/pull/859