SAP / project-foxhound

A web browser with dynamic data-flow tracking enabled in the Javascript engine and DOM, based on Mozilla Firefox (https://github.com/mozilla/gecko-dev). It can be used to identify insecure data flows or data privacy leaks in client-side web applications.
GNU General Public License v3.0
80 stars 15 forks source link

Adding Primitive Tainting #211

Open tmbrbr opened 6 months ago

tmbrbr commented 6 months ago

This PR enables tainting for numbers, and is a mergable version of the https://github.com/SAP/project-foxhound/tree/primitaint branch.

This PR adds the following features:

There is still some work to do before the merge:

cla-assistant[bot] commented 6 months ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 7 committers have signed the CLA.

:white_check_mark: leeN
:white_check_mark: tmbrbr
:x: Samuel Groß
:x: alexbara2000
:x: 0drai
:x: LukasHock
:x: saelo


Samuel Groß seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

leeN commented 6 months ago

As the PR combines several things (that are intertwined) in various stages of being complete, I was wondering the following:

As a transitional phase, we could consider enabling the primitive tainting aspect selectively, either via a preference or a mozconfig setting. This would incur additional testing costs as this is a change set that touches a large part of the tainting implementation, but our testing setup seems to be in a reasonable shape.

Especially how we build/combine taint flows for primitive taints and what to include (i.e., by not having all operations we lose a lot of information but tracking them blows up the runtime/memory cost for e.g., hash functions) seems to be something that is somewhat unclear on how it is done optimally.

So this PR contains some easy wins which would be useful right away (Adding sources via IDL attributes) and some parts which require more work. As splitting this PR in two is probably significant effort, maybe that would be a compromise?

tmbrbr commented 6 months ago

@leeN, not a bad idea to split the PR into two parts, especially the IDL taint source attributes piece. The only tricky part will be decoupling the two, as we developed them in parallel and the commits are intertwined.

I think most of the generator modification was done in 9f242f65fcaa5c09dbb6d89fdfe44f5c80526fa7, but there may be more that needs extracting.

On the other hand, the PR in its current state compiles and runs. Perhaps we can keep the fingerprinting sources disabled by default.

leeN commented 5 months ago

I noticed another thing during debugging #213. Currently, we can't disable sources related to primitive tainting in about:config.

leeN commented 4 months ago

So, as an update/status recap from my point of view:

This works as is; it is just user-unfriendly in its current form. This is fixed once #2 lands.

So the major remaining points seem to be:

I think we can get a good grasp by running a) the benchmarks (e.g., Ares 6) that mach supports and perform the same tests we did for FP tracer. There, the primitive tainting was fairly fast (<15% overhead on page load times)

I will look further, but I believe it might be best to decouple this change from this PR. Reworking the flow representation has benefits that go beyond primitive tainting. I certainly recall fighting this for the hand sanitizer study. As this change seems fairly disruptive (i.e., all tooling would require a significant rework), we probably have to get this right. What might be an option would be to store the flow internally as a directed graph and then offer the code to lineralize the flows to their current form. My impression is that the current Foxhound users mainly use Playwright (we can make an example available here) or extensions (same applies). This would preserve compatibility with existing tooling while offering the benefits of storing a graph instead of a bunch of vectors. However, this impression might be completely wrong, as I have limited insight into how others use Foxhound!

So, to sum up this point: I believe we should decouple the flow representation from this PR and offer to disable flow creation for primitive sources via the configuration, similar to how we allow disabling taint sources. This would allow us to merge this PR earlier and avoid divergence between mainline Foxhound and primitaint Foxhound.

tmbrbr commented 4 months ago

I agree that we should avoid diverging the main and primitaint branches, the idea of this PR is to have something we can merge to main.

I would also be in favour of splitting up the work if possible. As you suggest I think a full blown move to graph based taint flows is going to be a big effort and should be handled in a separate PR (if at all...).