deepkit / deepkit-framework

A new full-featured and high-performance TypeScript framework
https://deepkit.io/
MIT License
3.14k stars 116 forks source link

Observability tools #567

Open Enity opened 1 month ago

Enity commented 1 month ago

Hello! The Node.js ecosystem is currently suffering from a lack of production-ready solutions for enterprise needs.

Upon seeing your landing page, I was intrigued.

However, I couldn't find observability tools in the documentation or source code, except for logs.

Just as I didn't find them in the NestJS documentation (nor are there any mature libraries).

Currently, the integration of metrics is an absolute necessity for production deployment. In mature ecosystems (JVM stack, C#), this is a basic feature.

Is the creation of such a module planned?

The problem also lies in observability spreading across many parts of the system (logging, HTTP server, ORM, and so on). The later this functionality is implemented, the more difficult it will be to integrate.

Off-topic: Can you suggest, maybe you are solving the problem differently? I can't find an answer to the question: why does the Node.js community completely ignore metrics? How can applications be blindly deployed to production? The only conclusion I've come to is that no one writes enterprise projects in Node.js. Maybe everyone is building BFF Next.js, where metrics are not mandatory, or projects with 1 rps load

marcj commented 1 month ago

What do you mean with observability tools? Do you have some examples maybe. Are you referring to something like the Profiler?

image

Enity commented 1 month ago

Yes, I've seen that profiler. But im talking about metrics.

Examples: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics https://micrometer.io/

The application provides metrics that are collected by third-party products. By analyzing metrics, you can identify problem areas in the system, investigate incidents, and so on.

And only then will you need a profiler. Otherwise, how will you know what to debug if you have 10k rps on 10 app instances?


Another example: https://pulse.laravel.com/

However, competing with the millions of dollars invested in Grafana is a dead-end path.

It's much simpler to ship metrics externally, where production-ready software can collect them.

marcus-sa commented 1 month ago

@Enity you want Deepkit to be able to export metrics to e.g OpenTelemetry? I think we have already discussed this once before @marcj

marcj commented 1 month ago

yes, we have @deepkit/stopwatch which was designed for this purpose (generting metrics + send them, but currently targeted at the profiler(. not finalised yet to send the metrics to external services, we should do that

Enity commented 1 month ago

@Enity you want Deepkit to be able to export metrics to e.g OpenTelemetry? I think we have already discussed this once before @marcj

For example, yes. But our company uses Prometheus. It's simply more mature. OpenTelemetry is still incubating on https://www.cncf.io/projects/.

I looked at the stopwatch code, and I understand modules have their own metrics, as I mentioned? If so, this is built into the system architecture, and it seems adding instrumentation for Prometheus or something else would be possible without much pain.

Regarding the own profiler: Yes, it's a cool idea, but such a tool is really expensive to develop.

However, when it comes to the enterprise, I can't afford to use a framework without prometheus metrics support, as many processes are dependent on it.

Examples:

  1. Kubernetes autoscaling can be configured based on service metrics.
  2. The support team relies on the alerting system, which is configured based on metrics.

Right now, we're stuck with NestJS and our own tooling. There simply aren't any open-source solutions. It seems the community doesn't need it at all, and it appears that heavily loaded enterprise services aren't being written in node.js.

marcus-sa commented 1 month ago

For example, yes. But our company uses Prometheus. It's simply more mature. OpenTelemetry is still incubating on cncf.io/projects.

OpenTelemetry is far better for capturing traces, logs, and metrics. It's also far more flexible. It would make zero sense to use Prometheus when the OpenTelemetry Collector supports converting OTLP metrics to Prometheus metrics.

You're literally the first person I've met to advocate for Prometheus SDKs over OpenTelemetry SDKs.

https://www.timescale.com/blog/prometheus-vs-opentelemetry-metrics-a-complete-guide/

Enity commented 1 month ago

For example, yes. But our company uses Prometheus. It's simply more mature. OpenTelemetry is still incubating on cncf.io/projects.

OpenTelemetry is far better for capturing traces, logs, and metrics. It's also far more flexible. It would make zero sense to use Prometheus when the OpenTelemetry Collector supports converting OTLP metrics to Prometheus metrics.

You're literally the first person I've met to advocate for Prometheus SDKs over OpenTelemetry SDKs.

https://www.timescale.com/blog/prometheus-vs-opentelemetry-metrics-a-complete-guide/

I don't prefer Prometheus, I said that our company uses it. When we started, OpenTelemetry was still in its infancy. Large companies don't switch technologies to newer ones without substantial reasons.

For this project, sure, let's go with OpenTelemetry.