Closed trws closed 3 months ago
Ok, took a bit of non-trivial digging to work out how to address the original issue here. The first version worked great on all current-gen compilers, and failed on anything even vaguely behind. It was all non-determinism in cleanup of statics. Making that all work when the backing store and the strings themselves need to be able to have static or non-static lifetime took a little bit of reworking. To explain what's going on a little bit for posterity and reviewers:
There was also an issue with destructing the last string causing the reference count of the storage to drop to zero, causing it to be freed before the end of the critical section using a lock stored in the storage block. We now take a reference to it across that region to ensure destruction happens after we release the lock.
Sorry for the long wait. I'll review this PR today.
Attention: Patch coverage is 92.45283%
with 8 lines
in your changes missing coverage. Please review.
Project coverage is 75.4%. Comparing base (
d91b727
) to head (db24532
). Report is 131 commits behind head on master.
Files with missing lines | Patch % | Lines |
---|---|---|
src/common/libintern/interner.cpp | 82.9% | 8 Missing :warning: |
problem: while subsystems are a very small stable set that users don't produce input for, resource types are both dynamic and often coming from untrusted user input, so immortal strings would mean a potential attack vector for resource exhaustion.
solution:
dense_storage
intern string stores an integer in the range 0-num-strings in its group, with the group identified by a tag type, largely like before but a bit more ergonomic for use in a large project.weak_ptr<const std::string>
that refers to the shared_ptr version that is passed back to be used in the interned_string by the user. The shared_ptr is given a deleter that automatically removes the string from the unordered_map when the last copy is destroyed. The overall effect is that we get interned strings that are completely reference counted, and can be cleaned up safely either by running out of references to them in user code, or by the static table being cleaned up at the end of the run. The end-of-run cleanup is the reason for the extra shared_ptr dance in get_sparse_inner_storage, because otherwise it was possible for the table to be reclaimed before some long-lived rc strings, which caused issues. The full test suite has been run with ASAN to verify no use-after-frees or other issues remain in this version (at least that get hit by our test suite)A note on performance. Using refcounted strings like this is slower than using non-refcounted immortal ones, especially because the reference counts are atomic. That said, the difference in performance for the many-points test was small enough to be considered in the noise. If it comes up, we could swap out for a per-thread refcounting setup or something else, but this lets us avoid rebuilding too much infrastructure.