getsentry / team-webplatform-meta

0 stars 0 forks source link

SDK Starfish v0 requirements #60

Closed AbhiPrasad closed 1 year ago

AbhiPrasad commented 1 year ago

Add cache hit/miss rate

Any call to cache that doesn't return data (null) is treated as a miss. We define a span that has a cache hit as having the span data field cache.hit as true. Might require SDK to patch cache abstractions of framework (django).

To be done in Python SDK for any frameworks that support this - django is required.

### Cache hit/miss rate
- [ ] https://github.com/getsentry/develop/pull/917
- [ ] https://github.com/getsentry/sentry-python/issues/2043

Add db platform data to to span data

Add this information to span data field as db.system, matching OpenTelemetry's well known conventions.

For example, db.system of postgresql would indicate that it's a postgres database.

### DB platform data.
- [ ] https://github.com/getsentry/develop/pull/917
- [ ] https://github.com/getsentry/sentry-python/pull/2035
- [ ] https://github.com/getsentry/sentry-javascript/pull/7952
- [ ] https://github.com/getsentry/sentry-python/pull/2037
- [ ] https://github.com/getsentry/sentry-python/pull/2038
- [ ] https://github.com/getsentry/sentry-python/pull/2039
- [ ] https://github.com/getsentry/sentry-python/pull/2040
- [ ] https://github.com/getsentry/sentry-python/pull/2042

Add cache item size

Add a field to cache spans that defines how big the item that is being get/set: cache.item_size. This is an integer and should be in bytes. Blocked by cache hit/miss rate work.

### Cache Item size
- [ ] https://github.com/getsentry/develop/pull/933
- [x] Add `cache.item_size` to Python SDK

Add information used for action field

The action field on the extracted span metrics is either the operation for DB spans (SELECT/DELETE etc.), and http method for HTTP spans (GET/PUT etc.).

For operation, we'll be using and setting db.operation on span data. For http method, we'll be using http.method on span data.

If db.operation OR http.method is not set on span data, Relay will have to parse it out from the span description.

### Action
- [ ] https://github.com/getsentry/develop/pull/924
- [ ] https://github.com/getsentry/sentry-python/pull/2053
- [ ] https://github.com/getsentry/sentry-python/pull/2054
- [ ] https://github.com/getsentry/sentry-javascript/pull/7990
- [ ] https://github.com/getsentry/sentry-javascript/pull/7991
wmak commented 1 year ago

From the Slack thread we'll also need:

AbhiPrasad commented 1 year ago

We can add all of the db call-level attributes if possible: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/database/#call-level-attributes: db.operation.

db.operation in this case would be the field that would eventually be extracted to be action. Note that we can only provide db.operation if the underlying framework gives us this information, otherwise it will have to be extracted from the query - which has to be done server-side.

I'm still confused about span status - we attach http status codes under span.status already on the SDK. Can we not check for a 504 or a 408 and label timeout appropriately? Why does this require SDK changes?

mitsuhiko commented 1 year ago

What is a client timeout here? Presumably if something times out, you never get a response and thus no status code. What do SDKs report in that case? I wouldn't be surprised if a timeout usually translates into some sort of socket error which requires custom handling and might either result in a sentry error (and failed transaction), or is manually handled and the output is undefined.

AbhiPrasad commented 1 year ago

Based on a slack thread - here's a snippet from @gggritso about behaviour that he would like to see for timeouts.

import urllib
import urllib.request

try:
    response = urllib.request.urlopen('http://python.org/', timeout=0.0001)
    html = response.read()
except urllib.error.URLError as e:
    print("URL Error", e.reason)
    if (e.reason == 'timed out'):
        span['status'] = 'time out'
        raise e
AbhiPrasad commented 1 year ago

Given we've released the Python SDK with cache hit information, I think we can say we have enough data for v0 of starfish.

action (db operation, http method) is going to be parsed by Relay for the most part (but both Node/Python should be sending them regardless)

span status is still an open concern, but I think we can defer that will v1. cc @alexjillard

smeubank commented 1 year ago

the SDK should be sending http.method where possible under span data. In both node/python it was determined that none of the underlying instrumentation could parse and determine db.operation so this was something relay needed to do.

So Abhi doesn't have to sign into GH while OOO