dockwa / openpixel

Openpixel is a customizable library for building web tracking pixels.
https://engineering.dockwa.com
MIT License
413 stars 119 forks source link

Why does Openpixel send some data that might be unnecessary? #117

Open hsynlms opened 4 years ago

hsynlms commented 4 years ago

I just wanted to know why Openpixel sends the below information in events:

  1. Browser name
  2. Mobile device
  3. Timestamp in microseconds
  4. Timezone offset (minutes away from UTC)

The first two can be extracted from User-Agent by using some 3rd party tools like 51degrees.
This is the best and most accurate way (I think) to detect the browser and the device. Let the backend handle this data extraction. This will give people the flexibility to parse the User-Agent by using their own parser algorithm or by using a tool. This will also impact Openpixel performance, event URL length, and the distributable file size in a positive manner.

The third one can also be calculated on the server-side when the event request has arrived.

For the last one, my suggestion is to detect the timezone from the visitor's IP address. There are several tools for that such as ipstack. No need to send it to the request URL?

Here are some good catches for JavaScript getTimezoneOffset method and its accuracy: https://stackoverflow.com/questions/13/determine-a-users-timezone#comment38955981_1809974 https://stackoverflow.com/questions/13/determine-a-users-timezone#comment34982097_1809974

As I said in the beginning I am curious and just wanted to know the reasons behind it, all my opinions are open to discussion.

stuyam commented 4 years ago

Good question. 1 & 2 are just meant to make it easier for the backend do not have to do as much work. 2 the mobile check does some front end checks to try to determine that but can probably all done with user agent.

3 the timestamp check is important because it is more accurate. For example as soon as the page loads the first ting the event does is save a timestamp and once the library loads then it sends the event with the timestamp. Or say for example an http request is slow because the server is slow it will have the saved timestamp in the event data.

4 timeszone can definitely be gotten other ways like by looking up ip address.

All of this is to say, some events may be useful to some people, and redundant or unnecessary to others. So maybe we just allow it to be a configuration for what data to send. Though also those events are relatively small amounts of data. But maybe a configuration could be useful if people want to send less data.

hsynlms commented 4 years ago

It makes sense to keep sending timestamps in events if the queued events were not sent on time. Let's skip 3rd item in the list.

For sure people can always ask Openpixel to provide the data they want to have. Each information has a cost. I think Openpixel should provide only crucial data and let people send what they want as custom data (if they prefer to send data from the frontend side, of course, they can).

Most companies care about the integrated 3rd party tool's file size and where they are served. It's a best practice to keep integration/analytics files size as possible as low, not to block UI, to do things more performant, and to serve the file via CDN. I was working at a company that I was responsible for the front-end integration processes for customer websites. That's why I persist in this.

hsynlms commented 3 years ago

My proposal: 1st and 2nd items need to be removed completely due to bad detection accuracy. I think these data are useful for nobody. If someone needs those, they will want correct and up-to-date information. @stuyam