getsentry / team-sdks

A meta repository for tracking work across all SDK teams.
0 stars 0 forks source link

[Chore] Hash all set Metric values #85

Open elramen opened 3 weeks ago

elramen commented 3 weeks ago

Description

  1. All SDKs should stringify (or only accept strings) and hash (using the same hashing algorithm, e.g., crc32) all set metric values before emitting them.

  2. Update the developer docs to reflect this.

  3. Maybe also add a set of test cases in the developer docs so that sdk developers can verify that their implementation works as expected. This would be good if the stringification or the hashing algorithm works differently on different platforms. For instance, zlib.crc32("hello-world".encode("utf-8")) returns a negative int in python2 but an unsigned int in python3.

Why should we be doing this?

Currently, the metrics developer docs' example for set metric values show that: strings should be hashed (using crc32) and then converted into u32, integers of any size (signed or unsigned) should not be hashed, and floats should be floored and not hashed. This is how it's implemented in the python sdk atm which is referred to by the docs as the example implementation to follow. Relay will try to parse the value as a u32, and if that fails, the value will be hashed (by a different algorithm than crc32).

There are a few problems with this:

  1. All SDKs need to follow the several rules above to ensure accurate data for users updating the same set metric from different SDKs. This can be a bit complicated and it is also not the case atm, the implementation differs between SDKs.
  2. The user has to be mindful about the value-type of their input and thereby be aware that different types are hashed differently. This is arguably an implementation detail that the user shouldn't have to worry about.
  3. Connected to the point above: the send-metric command of sentry-cli needs a value-type flag since inputs can be parsed both as numbers and strings. This flag is messy, and again, the user shouldn't have to care about the implementation details regarding hashing.
  4. Relay has to spend unnecessary time hashing things that the SDKs could have done instead.
  5. Floats being floored can confuse the user, e.g., a user would probably expect sending 4.1 and 4.9 to increase the set by 2, not 1.

The proposed solution would fix all of these issues and improve the user experience.

Why now?

N/A

RFC

No response

Slack-Channel

No response

Notion Document(s)

No response

Stakeholder(s)

No response

Team(s)

No response