PayU / prometheus-api-metrics

API and process monitoring with Prometheus for Node.js micro-service
Apache License 2.0
130 stars 44 forks source link

Express routes like /a/* result in prometheus route /:0/* for request /a/a #114

Open kennsippell opened 1 year ago

kennsippell commented 1 year ago

Given this express route:

app.get('/wildcard/*', (req, res, next) => {
    res.json({ status: 'ok' });
})

Here is a test that fails:

        describe('when using wildcard with repeating string', function() {
            before(() => {
                return supertest(app)
                    .get('/wildcard/wildcard')
                    .expect(200)
                    .then((res) => {});
            });
            it('with no repeating strings', () => {
                return supertest(app)
                    .get('/metrics')
                    .expect(200)
                    .then((res) => {
                        expect(res.text).to.contain('method="GET",route="/wildcard/*",code="200"');
                    });
            });
        });

Here are some inputs for the get() value and current prometheus route. /wildcard/* is expected for all inputs.

Get Input Prom Route
/wildcard/something /wildcard/*
/wildcard/wild /:0card/*
/wildcard/wildcard /:0/*
kennsippell commented 1 year ago

This is caused by this code block supporting nest.js. In express, the value of req.params is {"0":"wildcard"} and so wildcard is replaced with :0.

I'd love to fix this, but I'm struggling because I don't really understand the intent of this mentioned code block. If I remove it, no existing tests fail. I found the commit which introduced it, but not much detail to explain it.

Any chance somebody would provide a sample input/output case to explain why this block matters?

kennsippell commented 1 year ago

I'm playing around with Nest.js scenarios attempting to understand this code block. I can reproduce similar oddities in Nest.js routes. For example, in test/integration-tests/nest-js/middleware-test.spec.ts I'm seeing:

    describe('#114 Reproduce in Nest.js', () => {
        before(() => {
            return request(server)
                .get('/users/app-id/app-id/456')
                .expect(200)
                .expect({
                    app_id: 'some_app_id'
                });
        });

        it('should add it to the histogram', () => {
            return request(server)
                .get('/metrics')
                .expect(200)
                .then((res) => {
                    expect(res.text).to.contain('method="GET",route="/users/:user_id/:user_id/:app_id",code="200"');
                });
        });
    });
Input Prom Route Result
/users/123/app-id/456 /users/:user_id/app-id/:app_id Nominal
/users/app-id/app-id/456 /users/:user_id/:user_id/:app_id Bug?
/users/s/app-id/p /u:user_iders/:user_id/a:app_idp-id/:app_id Bug?

I'm unclear why it would ever be desirable to have user-controlled inputs alter the monitoring route. Still playing around with Nest routing to find routes which make this code desirable.

If I can't find any, there are none captured in tests, and nobody can indicate why this is useful - can I propose a PR which removes this code block?

kobik commented 1 year ago

This is caused by this code block supporting nest.js. In express, the value of req.params is {"0":"wildcard"} and so wildcard is replaced with :0.

I'd love to fix this, but I'm struggling because I don't really understand the intent of this mentioned code block. If I remove it, no existing tests fail. I found the commit which introduced it, but not much detail to explain it.

Any chance somebody would provide a sample input/output case to explain why this block matters?

@kennsippell , i wasn't involved too much in the development of this package from the beginning, but i can tell you that the reason we alter the routes is that we use the request path as a prom label and we must ensure that the every label we define has low cardinality. which is something we can't guaranty unless we replace the url params that are input that we can't control with constant values. hence, we replace the value with :<paramName>.

having labels with high cardinality that are not bound might cause our app or prometheus to get to OOM.