getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.54k stars 4.12k forks source link

Better support for publicly hosted sourcemaps and the GitHub integration: make mappings apply on resolved urls, not sources #63082

Open dgoldstein0 opened 8 months ago

dgoldstein0 commented 8 months ago

Problem Statement

Today, the Github Sentry integration requires mappings to know that certain prefixes of sources fields in javascript sourcemaps (.map files) map to a given path in a specific github repo. The trouble is that when sources are relative, they should be (a) relative to the baseUrl field of the sourcemap which is itself (b) resolved relatively to the folder containing the file. Sometimes the sources could be absolute urls or at least domain relative, or perhaps even absolute paths to files on disk (which wouldn't make sense to publish). But when relative urls are used, they should be properly resolved. The trouble is that the mappings field of the github integration doesn't do that resolution - instead just making overbroad assumptions like "../../../../" is always our repo root (only true because the file happens to be nested 4 folders deep - deeper or shallower will be possible). So then if we had a sourcemap that used ../../ on each source, or worse ../../../../../ - any different level of nesting - then likely it's going to resolve to the wrong github url based on the current mapping mechanism.

Solution Brainstorm

What we should be configuring sentry is that maps to , and from there be able to tell it "here's the url of our repo root" or "here's the url of this folder in the repo" and then, when a source in a sourcemap resolves to a url with that prefix, it would apply the matching mapping and get a valid github url from that. Since this would be using baseUrl and resolving baseUrl relative to the url the sourcemap was fetched from, this would correctly account for relative urls.

Product Area

Issues - Source Maps

┆Issue is synchronized with this Jira Improvement by Unito

getsantry[bot] commented 8 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 8 months ago

Routing to @getsentry/product-owners-issues-source-maps for triage ⏲️

lforst commented 8 months ago

Thanks for bringing this up. I am a bit sceptical as to whether we should solve this from the Product side any further.

I think Sentry would not always be able to infer the right position from the information you suggested to provide to Sentry. But admittedly this is only a gut feeling I didn't think it through a 100%.

What I think we should do instead is fixing your sources field. Most bundlers, have an option that let's you configure a sort-of base path for the sources field. In rollup you have output.sourceMapPathTransform, in tsc there is sourceRoot. Generally configuring that to the right location should be way more robust.

dgoldstein0 commented 7 months ago

So I think you are making a lot of assumptions about our build and deploy processes which aren't true in all cases.

For starters: our legacy serving stack does not use an open source bundler, and the custom thing that we have for it that you could call a "bundler" is going to be killed, probably later this year. After this, we'll be serving just individual JS AMD files compiled by Typescript. And loaded with requirejs. Yeah, we still have requirejs in 2023... it's a long story with copious amounts of path dependence.

Anyhow we'll likely try to upgrade to shipping ESM modules without requirejs for that legacy stack, but that doesn't really change the non-bundler calculus: we use bazel to build all of our code, and bazel-orchestrated typescript gets used in many different future compilation steps - rollup for our new tech stack, webpack for various other internal web properties, and probably a whole variety of other things. An important thing that makes our builds function efficiently is that we use the same typescript (up to the selected module format) for every consuming build step - meaning that we must maintain a strict separation of concerns between the this shared part of the build and which urls the built artifacts get shipped to. The only way to maintain that separation of concerns is to use relative urls for baseUrls of the typescript-generated sourcemaps. At which point we then have the headache that if we every try to put a bunch of sourcemap files with relative baseUrls into the same page that then have their exceptions reported to sentry, then we basically can't configure the mapping correctly for all of the different sourcemaps at once.

I suspect, but haven't seen strong proof yet, that the current mappings will also be a problem for our more modern stack as while most of the code uses rollup, there are various bespoke components that don't. To be determined if this shortcoming is an issue there.

That said the assumption we've long run with is that if we can make the sourcemaps resolve in the browser, then our exception mapping pipeline would support them - we built our own integration with our in-house exception logging backend, and that part works pretty well - though with a hardcoded => mapping, which if we were continuing development on that probably would end up configurable and is equivalent to what I'm asking here.

lforst commented 7 months ago

Maybe I am stupid but I still fail to understand the problem. Sentry should, in theory, already provide you with everything you need. Would you mind explaining your issue to me like I am 5? You have to assume I know nothing about your setup.

dgoldstein0 commented 6 months ago

here's a small source file from our codebase

// metaserver/static/js/experiments/utils.ts
import {Features} from 'metaserver/static/js/experiments/types';

type QueryKey = readonly [string, Features];

export function getQueryKey<FeatureName extends Features>(featureName: FeatureName): QueryKey {
  return ['feature', featureName];
}

our build process transforms it to

// bazel-bin/metaserver/static/js/experiments/utils.js
define(["require", "exports"], function (require, exports) {
    "use strict";
    Object.defineProperty(exports, "__esModule", { value: true });
    exports.getQueryKey = void 0;
    function getQueryKey(featureName) {
        return ['feature', featureName];
    }
    exports.getQueryKey = getQueryKey;
});
//# sourceMappingURL=utils.preenvjs.map

// bazel-bin/metaserver/static/js/experiments/utils.preenvjs.map
{"version":3,"file":"utils.js","sourceRoot":"./","sources":["utils.ts"],"names":[],"mappings":";;;;IAIA,SAAgB,WAAW,CAA+B,WAAwB;QAChF,OAAO,CAAC,SAAS,EAAE,WAAW,CAAC,CAAC;IAClC,CAAC;IAFD,kCAEC","sourcesContent":["import {Features} from 'metaserver/static/js/experiments/types';\n\ntype QueryKey = readonly [string, Features];\n\nexport function getQueryKey<FeatureName extends Features>(featureName: FeatureName): QueryKey {\n  return ['feature', featureName];\n}\n"]}

Very important detail here: the sourceRoot is ./, and the sources don't include a path to the folder. That means the source comes from the same folder as the sourcemap. (This is a little bit of a lie since bazel-bin/ is actually a separate output folder, but we put the sourcesContent inline and pretend bazel-bin/ is actually the root of our repo, which works well enough)

when we deploy these two files end up at https://cfl.dropboxstatic.com/static/metaserver/static/js/experiments/utils-vfl.js and https://cfl.dropboxstatic.com/static/metaserver/static/js/experiments/util.preenvjs-vfl.map respectively - currently those would be https://cfl.dropboxstatic.com/static/metaserver/static/js/experiments/utils-vfl6FvtBm.js and https://cfl.dropboxstatic.com/static/metaserver/static/js/experiments/utils.preenvjs-vflmUpbJ2.map (maybe not visible to you - we enforce ip restrictions - but we opened those up to sentry's published ips). (my bug: sourceMappingURL in the .js file should be transformed with the -vfl hash for the sourcemap; I'll get on fixing that, but for the sake of this bug we can assume it's transformed)

We tell sentry to use externally hosted sourcemaps, so we then expect that when https://cfl.dropboxstatic.com/static/metaserver/static/js/experiments/utils-vfl6FvtBm.js is involved in a stack trace - however the browser formats that - that sentry will follow to the sourcemap. From that, sentry should see that utils.ts is the source file, and it's in ./ relative to https://cfl.dropboxstatic.com/static/metaserver/static/js/experiments/ - which we want then to map to metaserver/static/js/experiments/utils.ts in our github repo.

The trouble is, it can do this exact same logic for any other file in our codebase - if we were talking about an error including https://cfl.dropboxstatic.com/static/foo/bar.js, the sourcemap would also have sourceRoot=./ and sources=["bar.ts"] and we'd want it to map to foo/bar.ts in our github.

As is I don't see any way we can have sources=[util.ts] (from /static/metaserver/static/js/experiments/util.preenvjs-vfl.map and sources=[bar.js] (from /static/foo/bar.preenvjs-vfl.map) use the mappings option of the github integration to understand that these two sources are in completely different folders. The sourceRoot and location of the sourcemap file must be taken into consideration in order to locate the original sources from these sourcemaps - most of the location of the source files are implicit, from the location of the sourcemap itself.

does that make sense?

as for what we're seeing in sentry, I don't think we've turned on exception reporting to sentry from the pages that rely on this behavior yet.

lforst commented 6 months ago

Hi, thanks for taking the time to write this out. I now understand the problem.

@Swatinem @loewenheim can I hand this over to you?

Swatinem commented 6 months ago

We should already be joining these paths together in a proper way:

Can you link to a specific event on Sentry that I can look at?

dgoldstein0 commented 3 months ago

apologies for the silence here.

I don't have a specific event yet, because we haven't tried to roll out sentry on our legacy web platform that's I believe is going to have this problem. That will likely happen later this year at which point me or one of my coworkers will get back to you.