evanderkoogh / otel-cf-workers

An OpenTelemetry compatible library for instrumenting and exporting traces for Cloudflare Workers
BSD 3-Clause "New" or "Revised" License
208 stars 39 forks source link

Accurate timings when instrumenting code that doesn't do network IO #114

Closed johtso closed 1 month ago

johtso commented 2 months ago

So currently you get pretty useless timings when the code you're wrapping in a span doesn't do any network IO as the current time doesn't increment for security reasons.

The standard way to work around that is just to ping a URL to force time to update, e.g.:

async function ping() {
  await fetch('http://one.one.one.one', {method: 'HEAD'})
}

The issue here is that you need to sprinkle these pings in various places, and you also end up with your traces being polluted by these requests.

Would it maybe be possible to have the auto instrumentation ignore certain urls? Is this maybe something that's possible with Span processors in the config?

DaniFoldi commented 2 months ago

Hi @johtso,

I have actually considered something like this, however I believe it falls outside of the scope of the library for a simple reason: we can't know what urls you cant ignored, and we can't know where to put these fetches automatically. And if we added it everywhere, that adds unnecessary latency and subrequests, which people will wonder why they run out of earlier. You can achieve this yourself with some filtering in the span processor, as you mentioned, you can hide it simply by comparing the request url to your test url.

To simplify your use case, I actually have two suggestions for you: querying a domain that does not exist, such as https://hahaha-random.subdomain.that-does-not-existforsure-withsome-random2349877823788.com is usually much faster, 2-3ms if cached versus 30-50ms to one.one.one.one - won't save much cpu, just reduces overall request duration.

The other thing is, you can wrap your compute-heavy workload in a callback like this:

async function compute<T extends () => unknown>(cb: T): Promise<ReturnType<T>> {
  await fetch('https://hahaha-random.subdomain.that-does-not-existforsure-withsome-random2349877823788.com')
  return tracer.startActiveSpan(span => {
    const r = cb()
    await fetch('https://hahaha-random.subdomain.that-does-not-existforsure-withsome-random2349877823788.com')
    span.end()
    return r
  })
}

Your wrapper function becomes async, but you get timing info in your span with minimal added latency - and you retain types of cb.

johtso commented 2 months ago

Great tip with the non existent URL! I was definitely curious if there might be a faster endpoint to hit.

Totally understand that making something like this automatic might be one step too far.. as a middle ground maybe it would be a nice thing to mention in the docs?

If I did want to ignore fetches to certain URLs where would I start looking?

DaniFoldi commented 2 months ago

Adding it to the docs sounds reasonable, agreed. We could maybe export such a utility function from the lib even, as a semi-automatic instrumentation.

If you want them filtered out, the lib has a postProcessor option in the config, that you can set up like:

  postProcessor: spans => {
    return spans.filter(span => span.attributes?.['url.full'] !== 'https://github.com/')
  },

of course, replace the url with your exact url to be ignored. You can narrow the filter further if required, just bear in mind that attributes is a flat object, hence the ['url.full'] key.

jahands commented 1 month ago

We now export __unwrappedFetch() which should allow for hiding fetch spans that you don't want.

I agree with @DaniFoldi that inserting fetch calls to try to capture more accurate timings is out of scope of this library.

johtso commented 1 month ago

@jahands if I wanted to automatically sprinkle in fetches to ensure accurate timings, is there somewhere I could hook in to do that before and after an instrumented action is called?