Exodus-Privacy / exodus

Platform to audit trackers used by Android application
https://reports.exodus-privacy.eu.org/
GNU Affero General Public License v3.0
625 stars 62 forks source link

Identify new trackers #40

Closed U039b closed 5 years ago

U039b commented 6 years ago

In https://reports.exodus-privacy.eu.org/reports/37/:

U039b commented 6 years ago

Hi @seandiggity @BillCarsonFr @jawz101 @l1git @sanpii

ETIP is now online: https://etip.exodus-privacy.eu.org/ It is a collaborative platform meant to easily create track profiles. It is more convenient than adding an issue per tracker.

Feel free to send me an email to exodus@0x39b.fr specifying your desired username + email address and I will send you a temporary password. Once registered, you will be able to freely contribute to the tracker identification process. Cheers!

seandiggity commented 6 years ago

absolutely. awesome. thanks!

jawz101 commented 6 years ago

thanks. right now I'm going through that uniq_list file and removing obfuscated portions, google and android classes, and some things that look generally innocuous. Kinda interesting. Finding some stuff I hadn't seen before.

jawz101 commented 6 years ago

I just went through and added/updated all of the ones I'd collected info for. Do you all expect to reanalyze the apps for any new trackers that've been identified?

And does the progress bar beside each tracker on the https://etip.exodus-privacy.eu.org site mean it won't be ready until 100% completed?

Is someone going through our entries and making fixes? Like, I know some of the gradle entries I put are probably not always going to be a particular version number and some domains are randomly generated (ex: 234234135.mobileapptracking.com or whatever) so I didn't know if they get a better rule written.

Manu1400 commented 6 years ago

Opentracker

IzzySoft commented 5 years ago

Not sure which of these count as "trackers" (so please don't just copy them over unverified), but all of the below fall into the category "mobile analytics":

Codahale Metrics

Microsoft Azure Analytics

Parse.com

Splunk MINT

FlowUp

Keen Java Clients

kaputnikGo commented 5 years ago

Are you still accepting submissions here for new trackers?

IzzySoft commented 5 years ago

@kaputnikGo with the issue not closed, I assumed so :scream_cat:

jawz101 commented 5 years ago

I would just ask @uo39b for access to their etip website. I moved all of my submissions into it directly. But it doesn't look like any I did submit were ever officially added which stinks because I have several hundred more I could likely find in here :/

https://raw.githubusercontent.com/jawz101/MobileAdTrackers/master/hosts

IzzySoft commented 5 years ago

@U039b if you're indeed no longer accept submissions here (which I hope is not the case), it might be a good idea to say so (and to close this issue) :wink:

@jawz101 feel free to pick my above reported and add them from your side.

jawz101 commented 5 years ago

@IzzySoft @kaputnikGo There's a way to get an account on that site by emailing him here

seandiggity commented 5 years ago

Exodus Privacy has new leadership and may just not be aware of this github issue. I can send them upstream via https://etip.exodus-privacy.eu.org but you're right, let's figure out a workflow that works for everyone. Thanks all!

seandiggity commented 5 years ago

also, you can just put these in our YalePrivacyLab repo for tracker profiles, where we're also gathering new info... I will invite all of you as contributors.

IzzySoft commented 5 years ago

@seandiggity I'm not sure if I'll report trackers regularly – but sure it's good to know where to put them, and they hopefully will make their way into Exodus. You also can find my full library list (which not only contains trackers, but all kinds of libraries used in Android apps) in my GitLab repo if you're interested.

Further I'm not sure if I can provide full descriptions as you keep them in your repo. Is it OK to commit partially filled samples? Do you want them submitted directly to your repo, or via PRs?

U039b commented 5 years ago

Hi all! This issue should be closed since https://etip.exodus-privacy.eu.org has been developed in order to ease and centralize trackers categorization and description. If you want an ETIP account, feel free to send me an email to exodus@0x39b.fr specifying your desired username + email address and I will send you a temporary password. Once registered, you will be able to freely contribute to the tracker identification process.

We invite you to share/sync trackers info between ETIP and the Yale Privacy lab repo.

Cheers!

IzzySoft commented 5 years ago

Thanks for the heads-up, @U039b! Waiting for advice concerning "incomplete records" (I can't provide full ones as I've got no idea how to fill the gaps – especially network signature, Maven specifics and gradle; I'm not a dev) and "distribution guidelines". If that's permitted, I'll accept the invitation and share my findings.

Speaking of which: are there any issues with your scanner currently? For several hours now I'm always told to come back later as the queue is filled. Something hanging?

U039b commented 5 years ago

@IzzySoft it seems that some tasks are stuck in the queue, we will investigate ;-)

Regarding ETIP fields, network signature is a REGEX matching domain names (e.g. app-measurement.com) used by a tracker. For the other fields you mentioned, ignore them if you do not know what they mean.

U039b commented 5 years ago

@IzzySoft no more "come back later" ;-)

IzzySoft commented 5 years ago

it seems that some tasks are stuck in the queue

that was my assumtion, too.

no more "come back later"

@U039b just noticed – thanks a lot! :+1: :clap: :man_dancing: :man_cartwheeling:

Regarding ETIP fields, network signature is a REGEX matching domain names (e.g. app-measurement.com) used by a tracker.

Yes, so far I got. But it's domain names the corresponding tracker contacts, right? I've got no idea how to figure that. I'm just performing a basic static analysis of path names on the Smali, which is how I found some hundred libraries – those above trackers among them. So if I want someone else (here: you) to fill the gaps, you'd need a sample? Or could I simply skip this as well?

For the other fields you mentioned, ignore them if you do not know what they mean.

That's good to know! Maybe it would be a good idea to have a simple tutorial on the other repo, for folks like me who know enough to contribute but not enough to make "complete" submissions?

U039b commented 5 years ago

thanks a lot!

You are welcome!

But it's domain names the corresponding tracker contacts, right?

Domain names correspond to the remote servers contacted by the trackers to send collected data. You can find them by analyzing the network traffic of an application which uses a given tracker or by inspecting the binary looking for URLs or domains.

So if I want someone else (here: you) to fill the gaps, you'd need a sample?

Unfortunately, I am a bit busy. Anyway, once you have listed path names (you probably mean Java packages, you will find mode details here) you have to check what packages correspond to a tracker. Then, you can create a new one in ETIP and provide information you have gathered about the tracker.

Maybe it would be a good idea to have a simple tutorial

It would be nice to have tutorials for ETIP and Exodus-core, unfortunately, I do not have time :-/ But anybody can create a tutorial and we will be happy to put it at the right place ;-)

IzzySoft commented 5 years ago

Thanks @U039b – and I know exactly what you mean by "not enough time", as that's my situation, too …

And yes, that's what I meant by "static analysis" – though I use a different tool for it.

kaputnikGo commented 5 years ago

Added basic tracker submission template and first example to the Yale repo with the intention of enabling a quick and easy way to get proper new tracker info into Exodus - https://github.com/YalePrivacyLab/tracker-profiles And also to figure out the best method to add new trackers to Exodus.

seandiggity commented 5 years ago

That Taplytics profile looks great, thanks. Will make sure these go upstream, so if it's lower barrier-to-entry to submit to the YPL repo that's fine (then there's less reason to bother EP and @U039b for Etip accounts etc. as well).

kaputnikGo commented 5 years ago

Added 8 more taken from here, will keep using the commit summary with "basic tracker" to help ID when they go up in this format. fyi i check the yale tracker list, https://reports.exodus-privacy.eu.org/trackers/ and https://etip.exodus-privacy.eu.org/ before adding them, so hopefully that covers everything.

jawz101 commented 5 years ago

@U039b In the past, sometimes when I entered in network signatures and gradle string, I didn't really know the regex pattern to use. Some had version numbers in the gradle files I could find so I didn't know how you put those in to scan on your end. Are you all doing any fixes to our submissions when you see them like this?

example deltaDNA: com.deltadna.android:deltadna-sdk:4.10.0 should actually need to be entered with a wildcard but I didn't know the syntax to wildcard it.

And from their network traffic they follow this pattern:

collect9903crssb.deltadna.net
collect9999rsstn.deltadna.net
engage10059bltbg.deltadna.net
engage10077vpspd.deltadna.net

but I just entered deltadna.net

@IzzySoft as for the extra fields such as

Maven repository:
Artifact id:
Group id:
Gradle:

This is where I just start googling for their developer documentation. That's where it takes a little research. Say, for deltaDNA I would search Google for blahblah sdk.

deltaDNA sdk There's also sites devoted to junk like this such as programmableweb.com first result takes me to their developer integration documentation where app developers get instructions on how to add the ad code into their apps.

example documentation

And then I click around until I find the android documentation and search for words like com.deltadna or whatever the code string I found is as well as gradle & maven. Sometimes I get lucky and see a maven repository link or whatever and I write that down maven { url 'http://deltadna.bintray.com/android' and then open the url or maybe search Google again example a lot of them seem to use that bintray site. At this point I click around on this page and find something like this page https://bintray.com/deltadna/android/deltadna-sdk

has a little thingy at the bottom that says maven and shows

<dependency>
<groupId>com.deltadna.android</groupId>
<artifactId>deltadna-sdk</artifactId>
<version>4.10.1</version>
<type>pom</type>
</dependency>

and a second tab called gradle that shows compile 'com.deltadna.android:deltadna-sdk:4.10.1'

It was just all guesses to see if I could find what the etip site was looking for and all of these 3rd party companies seem to have these sorts of steps in their documentation to integrate ads & analytics

seandiggity commented 5 years ago

Added 8 more taken from here, will keep using the commit summary with "basic tracker" to help ID when they go up in this format. fyi i check the yale tracker list, https://reports.exodus-privacy.eu.org/trackers/ and https://etip.exodus-privacy.eu.org/ before adding them, so hopefully that covers everything.

Right, that should. We'll be adding quite a few more to the YPL repo as part of the crowdsourcing I'm doing via Mozilla Open Leaders project.

jawz101 commented 5 years ago

Over the past couple of weeks I've done quite a bit of work on the etip site. Filled in a lot of blanks on existing signatures and added maybe 20-40... I can't tell. Anyways, has anyone from the project taken a look at them?

I'd like to fix my mistakes if I did anything incorrect. My main concerns are the format of the regex on the network signatures as well as what we do if the build.gradle entries could have versions. like if com.example.sdk.1.2.3 is what we find, that would assume there are other versions, so would we not use a regular expression to look for the consistent information?

Additionally, do the scans just try to look for at least one of these identifiying bits or if all of the characteristics are there (code signature, network signature, maven & gradle info must all be found or at least one of them must be found?) The reason I ask is because I went into existing entries and added maven repository information if I could find it but it looks like some tracker sdk's give instructions to proguard their code so I wonder if it may mean Exodus may never see that information on which to detect them, thus adding that information may break the detection rule if Exodus identifies a tracker only if all identifying bits are present.

Also, is there a preference to which repository to which volunteers should contribute: etip vs. YPL? It seems like doing things twice.

jawz101 commented 5 years ago

fwiw, I've kept adding more and occasionally looked at existing entries. For example, Unity Ads is likely underreporting. After reviewing their developer documentation, a code signature of com.unity3d.ads would just pick up their legacy sdk version. Their newer sdk would be com.unity3d.services.

Now I'm taking a look at some of the apps exodus lists as having no trackers and finding ones missed :P I don't know how to represent some of the situations as there are a lot of companies in the business of sdk's to manage an app's other trackers. As an added bonus, doing so also results in more international companies being found.

Actually, this might be a good practice moving fwd as this filter of No Tracker apps should almost be a representation of "clean" apps, which actually makes it a pretty compelling set of apps to review.

pnu-s commented 5 years ago

Hi @jawz101 !

Over the past couple of weeks I've done quite a bit of work on the etip site. Filled in a lot of blanks on existing signatures and added maybe 20-40... I can't tell. Anyways, has anyone from the project taken a look at them?

Thanks a lot for your work (and sorry for the late reply), it is greatly appreciated by the Exodus Privacy team :). Unfortunately we have not got the time recently to look through the new entries in ETIP and import the data from ETIP to exodus. We plan to work on this and will try to find some time in the coming weeks but this is quite a tedious task.

Additionally, do the scans just try to look for at least one of these identifiying bits or if all of the characteristics are there (code signature, network signature, maven & gradle info must all be found or at least one of them must be found?) The reason I ask is because I went into existing entries and added maven repository information if I could find it but it looks like some tracker sdk's give instructions to proguard their code so I wonder if it may mean Exodus may never see that information on which to detect them, thus adding that information may break the detection rule if Exodus identifies a tracker only if all identifying bits are present.

As it is explained on this page, what we look for is the signature of the tracker. So AFAIK the maven & gradle information will not affect the tracker identification.

Cheers !

pnu-s commented 5 years ago

This issue should be closed since https://etip.exodus-privacy.eu.org has been developed in order to ease and centralize trackers categorization and description.

We are now closing this issue.

If you want an ETIP account, feel free to send an email to etip@exodus-privacy.eu.org specifying your desired username + email address and we will send you a temporary password. Once registered, you will be able to freely contribute to the tracker identification process.

Thanks again to everyone contributing to the tracker identification :). Cheers !