thread_local_var has a poor performance if a thread_local_var object is created each test iteration. This poor performance is caused by some oversights:
thread_local_ctx::thread_localalloc always grows the entries vector, even if an entry is unallocated (i.e. thread_localfree was previously called). The entries vector is not cleared between iterations either. This causes O(iteration) memory usage (due to the entries_ vector), and adds O(iteration^2) time overhead (in iteration_begin).
thread_local_var::~thread_local_var never calls thread_local_ctx::thread_local_free. This means that, even if the above issue was fixed, the performance bugs might persist.
Squash both of these bugs:
Make thread_local_var::~thread_local_var call thread_local_ctx::thread_local_free if necessary.
Make thread_local_ctx::thread_local_alloc reuse freed entries.
Unfortunately, this commit has one negative consequence: Win32-style Tls use-after-free is undetected in some cases.
thread_local_var has a poor performance if a thread_local_var object is created each test iteration. This poor performance is caused by some oversights:
Squash both of these bugs:
Unfortunately, this commit has one negative consequence: Win32-style Tls use-after-free is undetected in some cases.