evanderkoogh / otel-cf-workers

An OpenTelemetry compatible library for instrumenting and exporting traces for Cloudflare Workers
BSD 3-Clause "New" or "Revised" License
208 stars 39 forks source link

[Bug] `trace.getActiveSpan()` returns undefined, regression #115

Closed johnpyp closed 1 month ago

johnpyp commented 2 months ago

Reproduction repo: https://github.com/johnpyp/otel-broken-test

Basically, just taking the bare minimum example from the readme on a bare bones hello world worker, doesn't work currently to get the active span. Maybe a regression, or maybe broken by recent developer week updates from Cloudflare?

johnpyp commented 2 months ago

I tested this with a couple different rc versions randomly, and found that rc17 seem to work just fine. rc32, the latest, is breaking. rc29 I think is also broken

johnpyp commented 2 months ago

I believe the regression was first introduced in rc.26, which happens to be the same version a lot of the otel dependencies were updated, so I'm guessing that's the issue.

johnpyp commented 2 months ago

Forcing my version of @opentelemetry/sdk-trace-base to 1.17.1 and @opentelemetry/api to 1.6.0 makes it work with 1.32.

DaniFoldi commented 2 months ago

Hey @johnpyp,

Thanks for including a reproduction, I've taken a look and I think I have a guess at the root cause. There was not change in rc.26 to any dependencies, and rc.27 restricted from ^1.6.0 to ~1.6.0: https://npmdiff.dev/%40microlabs%2Fotel-cf-workers/1.0.0-rc.26/1.0.0-rc.27/

What this means in practice is that your package manager will install 1.6.x for the otel lib, and ^1.8.0 that resolves to the latest 1.x.x, the two of which cannot be satisfied by any single version. Now from here, you have two copies of the otel api, which relies on some magic shared internal globals, but since there are two versions in your code, you end up with the two versions never communicating to each other.

As for a solution, we are careful about updating otel dependencies, since they have randomly broken everything in the past. I am working on an e2e test setup that we can verify our changes against, until then, I recommend that you downgrade to 1.6.x (we should declare that as a peerDep, added it to my list), or override our version to 1.8.x, so you only have a single version to import from, and thus exports will be shared. Please let us know if that worked for you, I also can't wait to update and would be nice to resolve all issues before we bump officially.

johnpyp commented 2 months ago

Got it, that makes sense. Yeah downgrading to 1.6.0 seems to have worked.

tsteckenborn commented 1 month ago

That took me quite some time today as well...

johnpyp commented 1 month ago

If possible, it'd be great for otel-cf-workers to detect the issue and warn (when not in production).