SAP / project-foxhound

A web browser with dynamic data-flow tracking enabled in the Javascript engine and DOM, based on Mozilla Firefox (https://github.com/mozilla/gecko-dev). It can be used to identify insecure data flows or data privacy leaks in client-side web applications.
GNU General Public License v3.0
82 stars 16 forks source link

Memory Leak of StringTaints #61

Closed tmbrbr closed 2 years ago

tmbrbr commented 2 years ago

The StringTaint implementation is leaking memory, for example in SpiderMonkey.

The following JavaScript sample:

s = String.tainted("hello");
s = s.replace("l","o");

t = s;
for (let i = 0; i < 100; i++) {
    t = t + s;
}

console.log(t);

when run with valgrind (using suggestions from here):

G_SLICE=always-malloc valgrind --leak-check=full --smc-check=all-non-file --vex-iropt-register-updates=allregs-at-mem-access --show-mismatched-frees=no --log-file="valgrind.log" --read-inline-info=yes obj-tf-spider/dist/bin/js taint-test.js

gives the following output:

...
==2729935== 1,830,552 (2,136 direct, 1,828,416 indirect) bytes in 89 blocks are definitely lost in loss record 26 of 26
==2729935==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2729935==    by 0xC81EFA: StringTaint::operator=(StringTaint const&) (taint/Taint.cpp:437)
==2729935==    by 0x4D9757: init (js/src/vm/StringType-inl.h:207)
==2729935==    by 0x4D9757: new_<js::CanGC> (js/src/vm/StringType-inl.h:236)
==2729935==    by 0x4D9757: JSString* js::ConcatStringsQuiet<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JSString*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JSString*, (js::AllowGC)1>::HandleType, js::gc::InitialHeap) (js/src/vm/StringType.cpp:1003)
==2729935==    by 0x4DA2CB: JSString* js::ConcatStrings<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JSString*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JSString*, (js::AllowGC)1>::HandleType, js::gc::InitialHeap) (js/src/vm/StringType.cpp:1026)
==2729935==    by 0xA11A53C6225: ???
==2729935==    by 0x7198DC7: ???
==2729935==    by 0xA11A53C454E: ???
==2729935==    by 0x80F081: EnterBaseline (js/src/jit/BaselineJIT.cpp:143)
==2729935==    by 0x80F081: js::jit::EnterBaselineInterpreterAtBranch(JSContext*, js::InterpreterFrame*, unsigned char*) (js/src/jit/BaselineJIT.cpp:212)
==2729935==    by 0x2B52E2: Interpret(JSContext*, js::RunState&) (js/src/vm/Interpreter.cpp:2201)
==2729935==    by 0x2B49B5: js::RunScript(JSContext*, js::RunState&) (js/src/vm/Interpreter.cpp:397)
==2729935==    by 0x2C4F86: js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JS::Handle<JSObject*>, JS::Handle<JS::Value>, js::AbstractFramePtr, JS::MutableHandle<JS::Value>) (js/src/vm/Interpreter.cpp:780)
==2729935==    by 0x39BE50: JS_ExecuteScript(JSContext*, JS::Handle<JSScript*>) (js/src/vm/CompilationAndEvaluation.cpp:539)
==2729935== 
==2729935== LEAK SUMMARY:
==2729935==    definitely lost: 2,472 bytes in 103 blocks
==2729935==    indirectly lost: 1,852,160 bytes in 10,211 blocks
==2729935==      possibly lost: 0 bytes in 0 blocks
==2729935==    still reachable: 0 bytes in 0 blocks
==2729935==         suppressed: 0 bytes in 0 blocks
==2729935== 
==2729935== Use --track-origins=yes to see where uninitialised values come from
==2729935== For lists of detected and suppressed errors, rerun with: -s
==2729935== ERROR SUMMARY: 7786 errors from 45 contexts (suppressed: 0 from 0)

The reason for the leak is that StringTaint stores a vector of TaintRanges in a pointer which is managed by StringTaint. Actually, the caller has to ensure that clear() is called before the StringTaint is destroyed. In most cases this happens when a StringType is destroyed by the SpiderMonkey VM (via the finalize method, but is not happening in all instances.

Attaching a debugged and watching the location of one of the created (but not finalized) Strings, shows that the JS Cell is only altered when the program terminates. Here's the Stack:

Thread 1 "js" hit Hardware watchpoint 2: *0xb0c076018c0

Old value = 560
New value = 0
__memset_avx2_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:151
151 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) where
#0  __memset_avx2_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:151
#1  0x0000555555b95e51 in mozilla::detail::AtomicBase<unsigned long, (mozilla::MemoryOrdering)0>::AtomicBase (this=<optimized out>)
    at /mnt/workspace/project-foxhound/obj-tf-spider/dist/include/mozilla/Atomics.h:291
#2  mozilla::detail::AtomicBaseIncDec<unsigned long, (mozilla::MemoryOrdering)0>::AtomicBaseIncDec (this=<optimized out>) at /mnt/workspace/project-foxhound/obj-tf-spider/dist/include/mozilla/Atomics.h:335
#3  mozilla::Atomic<unsigned long, (mozilla::MemoryOrdering)0, void>::Atomic (this=<optimized out>) at /mnt/workspace/project-foxhound/obj-tf-spider/dist/include/mozilla/Atomics.h:389
#4  js::gc::MarkBitmap::MarkBitmap (this=0xb0c076018e8) at /mnt/workspace/project-foxhound/obj-tf-spider/dist/include/js/HeapAPI.h:202
#5  js::gc::TenuredChunkBase::TenuredChunkBase (this=0xb0c07600000, runtime=0x0) at /mnt/workspace/project-foxhound/obj-tf-spider/dist/include/js/HeapAPI.h:240
#6  js::gc::TenuredChunk::TenuredChunk (this=0xb0c07600000, runtime=0x0) at /mnt/workspace/project-foxhound/js/src/gc/Heap.h:624
#7  js::gc::TenuredChunk::init (this=0xb0c07600000, gc=0x5555574686c8, allMemoryCommitted=false) at /mnt/workspace/project-foxhound/js/src/gc/Allocator.cpp:834
#8  0x0000555555bc2b0f in js::NurseryDecommitTask::run (this=0x55555746ad48, lock=...) at /mnt/workspace/project-foxhound/js/src/gc/Nursery.cpp:178
#9  0x0000555555bb39df in js::GCParallelTask::runTask (this=0x55555746ad48, lock=...) at /mnt/workspace/project-foxhound/js/src/gc/GCParallelTask.cpp:175
#10 js::GCParallelTask::runFromMainThread (this=0x55555746ad48) at /mnt/workspace/project-foxhound/js/src/gc/GCParallelTask.cpp:150
#11 js::Nursery::disable (this=0x55555746a9a0) at /mnt/workspace/project-foxhound/js/src/gc/Nursery.cpp:354
#12 0x0000555555ba031a in js::gc::GCRuntime::finish (this=0x5555574686c8) at /mnt/workspace/project-foxhound/js/src/gc/GC.cpp:841
#13 0x00005555558e6e36 in JSRuntime::destroyRuntime (this=0x555557468190) at /mnt/workspace/project-foxhound/js/src/vm/Runtime.cpp:307
#14 0x000055555583673d in js::DestroyContext (cx=0x5555574731e0) at /mnt/workspace/project-foxhound/js/src/vm/JSContext.cpp:238
#15 0x0000555555670efa in main::$_3::operator() (this=<optimized out>) at /mnt/workspace/project-foxhound/js/src/shell/js.cpp:12628
#16 mozilla::ScopeExit<main::$_3>::~ScopeExit (this=<optimized out>) at /mnt/workspace/project-foxhound/obj-tf-spider/dist/include/mozilla/ScopeExit.h:106
#17 main (argc=<optimized out>, argv=<optimized out>) at /mnt/workspace/project-foxhound/js/src/shell/js.cpp:12833
(gdb) 

Where it looks like instead of clearing up objects individually, the memory is simply overwritten (I guess this is quicker). For regular Strings this is not a problem as any char buffers associated to the object will also be stored in the SpiderMonkey Heap somewhere and also be deleted.

In the case of the TaintRange vector (which lives outside the JS Heap), the pointer is overwritten and therefore memory lost.

Note: The motivation for using regular memory management rather than the SpiderMonkey one is to make the Taint code portable between SpiderMonkey and the other String representations in Firefox, such as DOMStrings and nsStrings.

tmbrbr commented 2 years ago

Tentative plan for a fix is to Template / Subclass StringTaint so that implementations need to provide memory management callbacks.

The JS implementation would assign memory only on the JS Heap, so will be removed with the rest of the String. A vanilla std implementation can still be used which uses new and delete operators as necessary.

Challenge will be how to port a std::vector into a SpiderMonkey JS object.

Some hopefully helpful links:

tmbrbr commented 2 years ago

A summary of the problem so far:

The nursery collection is done here: https://github.com/SAP/project-foxhound/blob/main/js/src/gc/Nursery.cpp#L1221

The problem being that objects are deleted by overwriting memory in blocks, so no finalize is called. This is not an issue normally, as the nursery also keeps track of all buffers allocated etc. and just deletes everything.

An alternative workaround would be to do an extra scan of objects remaining in the nursery and delete their taint information.

More info here: https://hacks.mozilla.org/2014/09/generational-garbage-collection-in-firefox/