add visual indication when sentry truncates dict data

ghazi-git commented 1 year ago

Problem Statement

My understanding is that sentry has a limit on the amount of data it collects, so it sometimes truncates the data to make sure it doesn't exceed that limit. This is especially clear for long strings (like queries in the breadcrumbs section) with the added ellipsis at the end, but not so much for python dicts.

Here's one example of what I see when there is an error when receiving a postmark webhook (and here's the complete object for reference): In the above example, I'm sure sentry has truncated the data because I can see the postmark metadata in one frame of the stack trace (we assign the metadata to a separate variable with metadata = payload["Metadata"] and I can see the contents of that variable)

The problem is that the dict representation doesn't tell me if sentry has truncated the data or not. So, now for each error, I need to think twice: did sentry truncate the dict data or did the client of my code not submit the key I'm expecting to find.

Solution Brainstorm

I think a simple ellipsis at the end of each truncated dict would be a good visual indication that sentry has truncated the data.

{
  "key1": "value1",
  "key2": "value2",
  ...
}

However, I'm not sure what happens to existing data especially if sentry didn't collect before information about if variables were truncated or not.

Lms24 commented 1 year ago

Hi @ghazi-git thanks for writing in. I'm not yet sure that I understand what you're asking for exactly. For instance, where do you expect this data to show up/how do you add it to the Sentry event? Are you using the @sentry/node SDK? If yes, are you using the RequestData integration?

Depending on the structure of your data, we might indeed normalize it to e.g. a certain depth. To adjust this, have you checked out the normalizeDepth and normalizeBreadth options?

ghazi-git commented 1 year ago

my bad @Lms24, I didn't realize I'm posting this to getsentry/sentry-javascript, maybe the right place for it is getsentry/sentry-python or getsentry/sentry? I'm using the python sdk, the data was all automatically collected by sentry-sdk.

My problem is with data display in the issue page in sentry cloud UI. Data was collected automatically when an error happened on the server, it seems like it was normalized according to the configuration options you mentioned, then sent to sentry servers. After that, in the issue page in sentry UI, the data is displayed but there is no visual indication that it was normalized. While this can be considered my bad for not knowing about the normalization options, it would be great if a visual indication was added whenever the data was normalized. That way it's clear to anyone looking at the issue page while debugging a prod error that part of the data is missing due to data normalization.

Lms24 commented 1 year ago

Ahh I see, let me transfer this issue to the python SDK repo for further triage :) (now "dict" data makes much more sense to me 😅). If it turns out that there's nothing SDK wise to change, we'll transfer it to the main getsentry/sentry repo.

Lms24 commented 1 year ago

cc @antonpirker - would you mind taking a look at this?

antonpirker commented 1 year ago

Hey @ghazi-git ! Yes, that makes sense (I thought it would add some indication if the body is truncated) We have put this on our backlog, but can not promise anything for an ETA because we have quite some stuff on our plate.

But If you want to give it a go and submit a PR, this is always very welcome! (Probable solution would be to wrap the truncated body into a AnnotatedValue field)

ghazi-git commented 1 year ago

hearing that this is on your radar is good enough for me. Still, I'll try to take a look at it once I get some time.

ghazi-git commented 1 year ago

After being lost for a while in the recursive calls inside serialize, I think sentry-sdk is sending the necessary info to notify sentry server that the request data was truncated. In the screenshot below, notice how meta_stack has info about the length of the request payload which is 15.

While I don't understand fully how everything works, my guess is that sentry-sdk sends the original length of the request data when it truncates it. In this case, the request data originally had 15 keys, but the request data in the event sent to sentry servers has only 10 keys (10 is also the value of MAX_DATABAG_BREADTH ).

If what I mentioned above checks out, I think this issue can be moved to the getsentry/sentry repo for further review on their end.

sentrivana commented 7 months ago

Transferring this to the main Sentry repo -- the payload from the SDK should already contain metadata about what was truncated, so it should just be a matter of how to display it nicely in Sentry.

As a sidenote, on the SDK side we've since added max_request_body_size="always" (see docs) which will prevent any truncation.

getsantry[bot] commented 7 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 7 months ago

Routing to @getsentry/product-owners-issues for triage ⏲️

MichaelSun48 commented 7 months ago

Hey there @ghazi-git! 👋 This is really good feedback - definitely agree that the issue details page should be transparent about which parts of the body we are truncating. I'm going to add this to the feature request backlog for now and surface it with the issue details team today. Thanks for writing in about this!

sl0thentr0py commented 4 months ago

another request https://github.com/getsentry/sentry-python/issues/3262

aberres commented 3 months ago

I am also running into stripped dicts and lists.

Having an error message telling you about duplicates in a list without any visible duplicates is, well, unfortunate.

tsx commented 1 month ago

Hi there! At Close we recently migrated to Sentry from a competitor, and we love the product so far but this truncating issue is tripping us regularly. Our engineers are regularly getting confused about what data was in the input while debugging issues. At times it feels like it'd save us time to not send the argument/locals data to Sentry at all.

Are there any workarounds we could apply?

getsentry / sentry

add visual indication when sentry truncates dict data #68426

Problem Statement

Solution Brainstorm