Open derberg opened 2 years ago
Welcome to AsyncAPI. Thanks a lot for reporting your first issue. Please check out our contributors guide and the instructions about a basic recommended setup useful for opening a pull request.
Keep in mind there are also other channels you can use to interact with AsyncAPI community. For more details check out this issue.
@derberg Sounds interesting ... I would like to take this issue as my GSOC'22 proposal.😊
@ritik307 sounds awesome!
@smoya @BOLT04 @magicmatatjahu any objections to have this endpoint first on server-api
? i personally think better to add it here and then if we measure to big traffic, we can always split into a separate microservice
No problem for me, but we have to remember that we also provide that project as Docker Image, so people will have also that path. We have to think how to avoid unnecessary paths for people which use that project.
any objections to have this endpoint first on server-api? i personally think better to add it here and then if we measure to big traffic, we can always split into a separate microservice
@derberg no problem for me 🙂, this is pretty cool!
No problem for me, but we have to remember that we also provide that project as Docker Image, so people will have also that path. We have to think how to avoid unnecessary paths for people which use that project.
@magicmatatjahu I get what you're saying and if this new endpoint does in fact need to use external services (e.g. Google APIs), we would need new config/environment variables for API keys, etc. I propose we use feature flags to solve this. On our deployed version of the API the feature is on, but for local development it's not. If someone wants to try it out locally, they just have to configure the necessary values and turn the toggle on to start measuring spec adoption in their own environment 🙂 What do you think?
I love the idea of adding our schemas to Schema Store. I didn't know about it until 5 minutes ago and I like it a lot 👍.
I want to add some feedback regarding the creation of a service for serving the schemas:
Serving static files such as JSON Schema files in a fast and reliable way is exactly the reason why CDNs exist. Considering the possible amount of traffic this service will have, and the fact it will keep growing on time (more users, more tooling, etc), I would not advocate for creating and maintaining this service by ourselves. And in fact, the good point is that we are already using a free CDN for serving our website: Netlify (which has multiple cloud providers).
I understand we want this service because we need those metrics (maybe there is another strong reason I missed, so please correct me). Would it make sense to just investigate and ask/pay for their Analytics product?
There is also the following Draft PR by @jonaslagoni : https://github.com/asyncapi/website/pull/502 that might make sense to check. It aims to serve AsyncAPI JSON Schema files from our website.
On the other side, we could consider the same approach with any other CDN product that offers analytics, such as AWS S3, GCP Cloud Storage, etc (asking for budget, etc).
I would like to know your thoughts.
cc @derberg @ritik307 @magicmatatjahu @BOLT04
Cool. I think the most important is that you support the idea. It is not written in stone to have it as endpoint here:
@magicmatatjahu @smoya @BOLT04 please only keep in mind that we should leave as much as possible up to @ritik307 (if you still want to take this task for GSoC). You folks turn into mentors, just guide @ritik307 what needs to be checked and tried out to get the desired outcome.
@magicmatatjahu @smoya @BOLT04 please only keep in mind that we should leave as much as possible up to @ritik307 (if you still want to take this task for GSoC). You folks turn into mentors, just guide @ritik307 what needs to be checked and tried out to get the desired outcome.
Sure @derberg I would love to take this task for GSOC 😊 and it would be great if you guys mentor me.😊
I think that file hosting and adding metrics to ServerAPI itself will not be a problem. We have control over every part, so we won't have to use additional services. However, CDN would be better in this case and I am for this option!
I would like to retake this, especially after @derberg raised its concern on https://github.com/asyncapi/website/pull/502#issuecomment-1070960957.
There is something we should consider before moving forward with a custom solution with our own service. Right now we do not have services exposed openly for being consumed in the same frequency as static JSON Schema files would be. Exposing a service that has such important responsibility (it would serve our JSON Schema files!) should include a large battery of APM + Infrastructure monitoring, and maybe in the future (emphasis on future) also consider some on-call rotation. It might seem far from today, but if our user adoption keeps growing as it does, it would be a thing.
With a CDN provided by a SAAS company, you remove all of those concerns.
Again, I know we want some metrics, but IMHO it is totally worth asking Netlify and if it fills our goals, pay if needed for the Metrics service. I can tell you is worth paying for a service than having to handle your own high-available service.
regarding @smoya point about maybe using Netlify Analytics in combination with CDN. This is also one of the possible options. Some investigation for sure needs to be done first. This can definitely be an outcome for this task. I personally prefer CDN, just ignored the fact that Netlify might have some Analytics for it.
This is definitely what I prefer since you mentioned Netlify Analytics. Did you mean this https://docs.netlify.com/monitor-sites/analytics/ or something else?
regarding @smoya point about maybe using Netlify Analytics in combination with CDN. This is also one of the possible options. Some investigation for sure needs to be done first. This can definitely be an outcome for this task. I personally prefer CDN, just ignored the fact that Netlify might have some Analytics for it.
This is definitely what I prefer since you mentioned Netlify Analytics. Did you mean this https://docs.netlify.com/monitor-sites/analytics/ or something else?
Yup, this is the service I meant.
Netlify Analytics is available and ready, right in the dashboard, for any site you deploy to Netlify. It only costs $9/mo per site. Source: https://www.netlify.com/products/analytics/
some important info: https://github.com/asyncapi/website/pull/502#issuecomment-1088363548
some important info: asyncapi/website#502 (comment)
IMHO we stay with option to have all JSON files in
server-api
that would work like a proxy to do analytics. It is up toserver-api
maintainers to decide if it is ok to first do it inserver-api
and then if because of the load, it should be split. Nevertheless, IMHO JSON files should not be exposed directly on the website here as we are looking an opportunity to track adoption.
TL;DR: I still think we should avoid creating a new file server app. Instead, look for an alternative based on a SaaS provider . And I'm suggesting some alternative ideas to the previous one. I'm happy to keep evolving this idea and also to put it in practice asap.
I understand the need to get such metrics and how simple it seems to build a file server with built-in metrics. However, I want to stay strong on this idea: We should avoid managing services on our own (at this time). Some of the reasons have been exposed already in (my) previous comments but I'm going to list some of them here with a bit more detail.
AsyncAPI JSON Schema definitions are the most important pieces of software we provide to the community (IMHO). They are meant to be used by systems for parsing and validating AsyncAPI documents and services that use them on runtime for validating messages, among other use cases. We do have a package for both NodeJS and Go projects that users can use to import those schemas into their projects; however, we don't for any other language, meaning tooling will need to fetch those files at some point from the source.
However, who are the users of those raw files, and how do they use them? I can imagine a few use cases:
With this in mind, the following points are worth to be noted:
Having said that, I'm proposing stick with a SaaS-based solution from the first day that allows us only to take care about the very minimum: as max, collecting the metrics and processing them, but never about serving the files.
We tried with Netlify Analytics. Unfortunately, the metrics we want (hits on JSON Schema files) are not collected. Even though it is a matter of time they support it, we don't have an ETA for it.
There are several other ways we can do this, and those are some of the ideas I have in mind:
Netlify Log Drains allows sending both traffic logs and function logs to an external service, such as New Relic, Datadog, S3... and also to our own service (could be a Netlify Function as well). Netlify sends those logs in batches in near real-time. Logs are JSON/NDJSON format. You can see the output of those logs here. This is not available in all plans, but I'm sure the Netlify support team will be happy to enable this, especially now that we tried Analytics but didn't fulfill our use case.
sequenceDiagram
participant User
participant asyncapi.org (Netlify)
participant AsyncAPI Metrics collector
Note right of AsyncAPI Metrics collector: Netlify Function <br/>or<br/> any monitoring SaaS
User->>asyncapi.org (Netlify): https://asyncapi.org/definitions/2.3.0.json
asyncapi.org (Netlify)->>User: 2.3.0.json
asyncapi.org (Netlify)-->>AsyncAPI Metrics collector: Netlify Log Drains metrics
With this approach, and in the more complex solution, we will only care about the metrics collector service, which could eventually be down but won't affect the user request. In the case of using any SaaS, it will be straightforward. As a side note, there are free tiers in services like NewRelic that maybe could fit our case.
Netlify Edge Handlers work by letting you executing code on the edge directly, intercepting the request. We could run Javascript code there to collect the metrics we want; in our case, the hits on the definition files. This is in BETA right now (you should ask to enable it). However, I would ask you for an ETA for going public. I guess they should have plans to release it as a public beta in the short-mid term.
EDIT: Netlify Edge Functions are now public beta, available for free. https://www.netlify.com/blog/announcing-serverless-compute-with-edge-functions
AWS S3 is a well-known solution for storing files. And with the metrics they expose (Cloudwatch), we could know the number of get
operations per file.
We would need to add a Netlify rewrite rule (not a redirect) that proxies the requests to the S3 bucket. This is easy to configure through the netlify.toml
file.
sequenceDiagram
participant User
participant asyncapi.org (Netlify)
participant AWS S3
User->>asyncapi.org (Netlify): https://asyncapi.org/definitions/2.3.0.json
asyncapi.org (Netlify)->>AWS S3: Netlify rewrite rule to asyncapi.s3.amazonaws.com/definitions/2.3.0.json
AWS S3->>asyncapi.org (Netlify): 2.3.0.json
asyncapi.org (Netlify)->>User: 2.3.0.json
The price for this is not pretty high. I did a quick estimation for 30 million requests per month (yeah, a lot) here. We should also include the price for the Cloudwatch metrics, but IIRC is almost "nothing."
If price is a concern, we could investigate Cloudflare R2, which is super cheap. However, the metrics they provide are unknown to me at this moment. Also, we would need to ask for access to R2 as it is in Beta at this moment.
From today, Netlify Edge Functions (Previously known as Edge Handlers) are now public beta, available for free. https://www.netlify.com/blog/announcing-serverless-compute-with-edge-functions
With the following, we could add the metrics push into the Netlify function https://github.com/asyncapi/website/pull/680
Taking this one off GSoC as it is important topic to handle and can't be delayed
How to start 😄 Lemme start with the positives ❤️
I love idea from https://github.com/asyncapi/website/pull/680 ❗
On the "negative" side. I have completely different view on Maintainance/High Availability/Response-time topics:
So, lets fo forward with idea from https://github.com/asyncapi/website/pull/680 ❗
Alternative/compromise: to not mix topics and try to solve all with one solution. Maybe https://github.com/asyncapi/website/pull/680 could have 2 alternative paths, one for the needs related to AsyncAPI JSON Schema and $id
and the other that we use only in SchemaStore
. One solution with separate paths, and measure data are clean. Still depend on the same rate limits anyway of course
I've been playing with Google Analytics 4 as a candidate for publishing our metrics. I have to say, I didn't get a good result. We could send events through the Measurement Protocol and it will kinda make the thing, but the UX for reading those metrics is completely awful:
As we can see, everything is focused on web apps, so not a really good fit for us. I know @derberg has played a lot with GA , Google Tag Manager, etc. Do you think it is still a fit for this, or should we rather consider using another alternative?
I've been checking NewRelic One new free tier, and it allows to send up to 100GB of data, events included. I did a simple test with a POST request and created a simple dashboard to see how it would look like.
Btw, New Relic has NRQL, a custom query language that allows you to easily query anything you send to them in a SQL query language fashion.
If anyone has another suggestion, I'm happy to keep investigating (there are plenty of others out there)
In the meantime, I'm moving forward with New Relic solution by now, and the development is all here: https://github.com/asyncapi/website/pull/680
In case you want to use another provider for metrics, I'm happy to adapt the code.
More on https://github.com/asyncapi/website/pull/680#issuecomment-1117469838
GA allows you to also create new view, custom components, with scheduled reports etc. But yeah, I'm not GA evangelist.
Tbh I think the approach with New Relic is super nifty, as long as we can use it for free of course 😆 I guess you @smoya and @fmvilas can anyway get us more free storage if we need 😆
❤️ from me for New Relic
Does it mean we have an agreement on implementation? 🙌🏼
Before we finalize we need to give time to @magicmatatjahu @jonaslagoni @BOLT04 @fmvilas to voice opinion as they own either website
or this repo, or just need solution (like Jonas)
yeah let's go with the new relic solution proposed by @smoya 👍 I think with that the Server API doesn't need any implementation, so we could close this issue when that PR is merged right?
wdyt everyone?
I think we can even transfer it to https://github.com/asyncapi/website now 🤔
Awesome @smoya 👍
I am also in favour of a solution using the New Relic @smoya 👏🏼
Yeah me too. Let's use New Relic. They have a powerful query language (NRQL) and it's easy to create new views of data 👍
JSON Schemas are now being served successfully under asyncapi.com/definitions and asyncapi.com/schema-store. A New Relic dashboard has been also created (can't be public unfortunately):
PR to Schema-Store is waiting for review: https://github.com/SchemaStore/schemastore/pull/2310
cc @derberg @fmvilas
JSON Schema Store PR has been merged now, meaning all JSON Schema files fetched from it are now being downloaded from asyncapi.com/schema-store and metrics show that users are already fetching them:
cc @derberg
Omg this is so exciting 😍
❤️ Indeed! @smoya start thinking how do we send custom metrics from tooling 😝
❤️ Indeed! @smoya start thinking how do we send custom metrics from tooling 😝
We would need to expose a service that acts as a metrics ingest forwarding them to NR so we don't expose the NR API key on tooling but just send metrics to our service.
I would think about it eventually!
After fixing https://github.com/asyncapi/spec-json-schemas/issues/236, JSON Schemas for different AsyncAPI versions are being downloaded from JSON Schema Store:
I see there are downloads from all versions, and I really doubt those downloads are organic or in purpose. What I think it's happening is that, since the schema served by Schema Store now is https://github.com/asyncapi/spec-json-schemas/blob/master/schemas/all.schema-store.json, JSON Schema parsers might be downloading ALL referenced ($ref
) schemas at once instead of under demand.
However, VSCode IDE is still showing the error:
which @derberg mentioned already in https://github.com/redhat-developer/vscode-yaml/discussions/772#discussioncomment-3033074. So 🤷 ...
cc @derberg @fmvilas
which @derberg mentioned already in https://github.com/redhat-developer/vscode-yaml/discussions/772#discussioncomment-3033074. So 🤷 ...
I think we entered the world where we have to decide if we want to do things in our JSON Schema the way spec and spec maintainers recommend, or just adjust schema to work with tooling provided by the community 🤷🏼
I see there are downloads from all versions, and I really doubt those downloads are organic or in purpose.
yeah, the numbers for 2.0.0-rc1
and 2.0.0-rc2
are suspiciously high and the same 😄
I think you are completely right about the reason, that it is due to refs parsing. Can we measure the number of times https://www.asyncapi.com/schema-store/all.schema-store.json
is fetched and automatically subtract that number from other downloads directly in the chart, without manual calculation? (kinda hack but I don't believe there is some other solution)
Can we measure the number of times
https://www.asyncapi.com/schema-store/all.schema-store.json
is fetched and automatically subtract that number from other downloads directly in the chart, without manual calculation? (kinda hack but I don't believe there is some other solution)
But if we do that, we will be invalidating all the counts for legitimate downloads. Correct me if I'm wrong, but:
Considering that 1 fetch of all.schema-store.json
end ups doing 10 fetches (one for each schema per AsyncAPI version), let's say we start from scratch and we just do one fetch:
Downloads | File |
---|---|
1 | all.schema-store.json |
1 | 1.0.0.json |
1 | 1.1.0.json |
1 | 1.2.0.json |
1 | 2.0.0-rc1.json |
1 | 2.0.0-rc2.json |
1 | 2.0.0.json |
1 | 2.1.0.json |
1 | 2.2.0.json |
1 | 2.3.0.json |
1 | 2.4.0.json |
We can't say, subtract 1
to each download, because this will end up happening:
Downloads | File |
---|---|
1 | all.schema-store.json |
0 | 1.0.0.json |
0 | 1.1.0.json |
0 | 1.2.0.json |
0 | 2.0.0-rc1.json |
0 | 2.0.0-rc2.json |
0 | 2.0.0.json |
0 | 2.1.0.json |
0 | 2.2.0.json |
0 | 2.3.0.json |
0 | 2.4.0.json |
@smoya yeah, you are right 🤦🏼 it sucks
@smoya so looks like we can only measure adoption of the spec in general, not its specific versions?
@smoya so looks like we can only measure adoption of the spec in general, not its specific versions?
Yes, as the IDE plugins are downloading just one schema (containing all of the versions), we can't know which one they are using. As the Schema Store matching is based on file patterns and not with content from the file, there is no way we could send data in the request made to our servers (For example, a header including the version).
So unfortunately, I'm running out of ideas here. I could open an issue in Schema Store repo asking for ideas.
It is not that bad. For me, most important is to measure how many users we have. So adoption of the spec in general, and not each version. I'm personally skeptical of such measurements, as then people complain that new versions are not adopted forgetting that they also do not use new versions if they do not need them (anyway, not topic for this issue).
If you can open a discussion with Schema Store, on how to fix things in future, that would be amazing. As even if I'm not interested with specific version adoption, I bet others are 😄
Can you adjust dashboard in New Relic 🙏🏼
So what is left is:
missing something?
I will definitely be interested to know if people are really adopting version 3.0 once it's out. Would be cool to get some insights. Maybe it's time to measure it on our tools.
As even if I'm not interested with specific version adoption, I bet others are
I think it is a crucial metric, even though not the only method to collect data from. I would love to have a metric where, after a release, we could see how downloads for older versions go down in favor of the new one.
If you can open a discussion with Schema Store, on how to fix things in future, that would be amazing. As even if I'm not interested with specific version adoption, I bet others are 😄
Done. No hope at all anyway. https://github.com/SchemaStore/schemastore/issues/2440
dashboard adjustment
Do you mean removing the versions stuff from it?
persisting data for lifetime
Do we really need that? With New Relic, we have 1 year right now. If more is needed, we could write some scripts to do aggregations every few months.
investigating how data collection actually works, caching of schema by plugins, and etag refresh on Netlify side. So we know if we actually get data of only "daily active users" or "increasing number of new users"
Related: https://github.com/SchemaStore/schemastore/issues/2438
Do you mean removing the versions stuff from it?
yeah, until we get it solved, this metric is not helpful, we just need total number
Do we really need that? With New Relic, we have 1 year right now. If more is needed, we could write some scripts to do aggregations every few months.
yes we need lifetime data to see over years how numbers change. But I do not mean we need that support on New Relic. Automated script, maybe running on GitHub Actions on a schedule is also fine 👍🏼
Related: https://github.com/SchemaStore/schemastore/issues/2438
yeah, not much help, other than knowing you can clear the cache on demand. Source code indicates it is based on etag
. What we need to check what Netlify does when website gets redeployed, if etag for all resources, even redirects is refreshed or not. We are doing some magic there 😄
This issue has been automatically marked as stale because it has not had recent activity :sleeping:
It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.
There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.
Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.
Thank you for your patience :heart:
This issue has been automatically marked as stale because it has not had recent activity :sleeping:
It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.
There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.
Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.
Thank you for your patience :heart:
@smoya so looks like we can only measure adoption of the spec in general, not its specific versions?
FYI, I tried to give it a last try, but didn't succeed 😞. All the info can be found at https://github.com/SchemaStore/schemastore/issues/2440#issuecomment-1857683852.
cc @derberg @fmvilas
overall adoption is still a great number to have 👍
FYI, I created https://github.com/SchemaStore/schemastore/issues/3460 as a feature request in Schema Store that, if adopted, will help us achieve our mission.
@smoya may I know the update of this issue please 😅
@smoya may I know the update of this issue please 😅
What do you need to know in particular?
What do you need to know in particular?
Like we will be going forward with this issue or not
Reason/Context
We do not know how many people use AsyncAPI. The most accurate number we could get is the amount of the AsyncAPI users that work with AsyncAPI documents. But how measure how many people out there created/edited AsyncAPI file?
The answer is a solution that includes:
asyncapi
in a filename created using AsyncAPI specSome more discussion -> https://asyncapi.slack.com/archives/C0230UAM6R3/p1622198311005900
Description
server-api
service that anyone can use to fetch AsyncAPI JSON Schema files of any versionIf time left, we need to expose numbers somewhere. Either embed Google Analytics diagram somewhere on the AsyncAPI website or just have at least an API endpoint that exposes the latest numbers.
For GSoC participates