getsentry / team-webplatform-meta

0 stars 0 forks source link

Fingerprinting Improvements #15

Closed HazAT closed 1 year ago

HazAT commented 1 year ago

Problem Statement:

We want to improve fingerprinting for specific platforms to make default grouping "better".

### Generic Tasks
- [x] Write docs with steps how to get new rules into prod
### Improved Fingerprinting
- [x] JS
- [x] Python
- [x] Ruby
- [x] PHP
- [x] Mobile
- [x] Java
- [x] go
- [x] test cases

PR: https://github.com/getsentry/sentry/pull/44761

Related: https://github.com/getsentry/sentry/pull/43940

in-app improvements for Python Related: https://github.com/getsentry/sentry-python/issues/1754

AbhiPrasad commented 1 year ago

After some initial investigating, we've gathered our results in an internal Notion doc.

In that document we've outlined the top frames for each platform (frames that appear more than 3 times). For example, here's the top frames for python.

{
  "python": {
    "stack.abs_path:**/Logger.py -app -group": 7,
    "stack.abs_path:**/__init__.py -app -group": 7,
    "family:native max-frames=1": 3
  },
}

Each SDK maintainer can probably examine these frames and see if they make sense to add to the general grouping rules.

There are also some platforms that are worth investigating deeper based on my preliminary analysis:

  1. java
  2. javascript/javascript-react/javascript-angular/javascript-vue/javascript-electron
  3. node
  4. native / minidump
  5. android
  6. apple-ios
  7. go
  8. php

Please DM me and I can share this data with you.

The rest didn't seem that interesting, but maybe worth another pair of :eyes:.

An important thing that came up was grouping/setting in app by company name (common module name). For example with Java, the rule stack.module:COMPANY_NAME.* +app was seen quite a bit.

Can we somehow detect these common package names and add them to grouping rules? From @mitsuhiko on that notion doc:

In Python we have a setting that customers can set to configure this on the client side. Having a convenient custom setting for this might be really neat, and more importantly, we might even check regularly based on submissions and other info if we should auto populate that setting maybe? Shouldn’t take too long to auto detect the package names.

Maybe we can even reuse the transaction name classifier?

Finally, there are some open questions we want to answer

  1. Do we want to change grouping based on usage on common concurrency libraries/functions? For example, stack.module:*java.lang.Thread*.
  2. Do we want to change grouping based on the existence of other apm providers? (new relic, datadog, etc.)