ghostery / trackerdb

Ghostery Tracker Database
Other
80 stars 25 forks source link

[Unidentified Tracker]: fd.nl #110

Open GRadziejewski opened 1 year ago

GRadziejewski commented 1 year ago

Request

fd.nl

Location

https://fd.nl/

Tracker Company/Organization

FD Mediagroep

Company/Organization Website

https://fdmg.nl/en/

Company/Organization Privacy Policy

https://fdmg.nl/privacy-statement/

Describe the company/organization

FD Mediagroep is the information company for the business community in the Netherlands. With brands including Het Financieele Dagblad (FD), BNR, FD Persoonlijk and Company.Info, it provide information on the most important news from a financial and economic perspective to entrepreneurs, corporate executives and professionals.

Category

Switch0XD commented 1 year ago

Hello @GRadziejewski Can you me help in how to get started with this enhancement? Thanks

philipp-classen commented 1 year ago

We currently lack documentation on how to get started (https://github.com/ghostery/trackerdb/issues/117). I hope we can improve that soon; but in a nutshell, adding patterns involves two steps:

  1. Doing some research: Do you see potential tracking requests on the page? If so, how do these requests look like? What company is operating the tracker?
  2. Coming up with detection rules (this can sometimes be relatively easy, but can also be difficult). In simple examples, request are all going to the same domain (e.g. Cloudflare Insights sends to cloudflareinsights.com, which can be expressed as ||cloudflareinsights.com^$3p). The format is the same as adblocker filter are using (see https://github.com/DandelionSprout/adfilt/blob/master/Wiki/SyntaxMeaningsThatAreActuallyHumanReadable.md). For instance, ||cloudflareinsights.com^$3p means all third-party requests that a website sends to cloudflareinsights.com.

In this concrete case, I feel it is not straightforward how to deal with it, since the site https://fd.nl/ seems to belong to https://fdmg.nl/. Thus, it is not clear how to categorize FD Mediagroep (categories are described here https://github.com/ghostery/trackerdb/blob/main/docs/categories.md).

Maybe analyzing a real (site analytics) tracker could be easier? For instance, https://github.com/ghostery/trackerdb/issues/33 (tally-1.qubitproducts.com). If you can find out information, feel free to post it in the comments. But adding new patterns is currently a bit tricky, unfortunately.

Switch0XD commented 1 year ago

Can we add media group category for the tracker of Mediagroup.

For issue #33 what exactly I've to do ?

philipp-classen commented 1 year ago

Can we add media group category for the tracker of Mediagroup.

I'm not sure how to best model such cases. I fear if we go in that direction, we may end up with all kind of business sectors. For instance, if BMW would send data from one domain to another, we could consequently end up with an entry for BMW (category: car manufucture/industry?), and so on.

The underlying issue is that in this specific case, it looks to me as if a company is just using multiple domains, but could be still considered one entity. So, it is not really tracking and is also quite isolated, since it does not cover other pages (in contrast, trackers such as Google Analytics are present on many pages and thus can see more user activity).

For issue #33 what exactly I've to do ?

I would need add a bit more of information. Currently, it is indeed difficult to pick it up.

I think, it would be easier if there are more test pages. For finding test pages, I can look into raw data that we have from whotracks.me. That raw data is not public, however, so that is something that we have to provide. Having multiple test pages helps to understand how tracking requests look like and allows to test them.

One thing that is a bit more concrete is to help investigate whether it should become a new pattern entry or whether it should be merged into the existing one. I added a comment now (see https://github.com/ghostery/trackerdb/issues/33#issuecomment-1721845920).