Open ivoanjo opened 2 weeks ago
Benchmark execution time: 2024-11-28 15:28:02
Comparing candidate commit 3cb2cb08 in PR branch ivoanjo/prof-9476-managed-string-storage-try3-clean
with baseline commit 17f36e91 in branch main
.
Found 0 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics.
Omitted due to size.
Forgot to comment sooner, but I tried out using this in the Python profiler. Commit here, very WIP because I was trying out other approaches as well at the time. The biggest issue I ran into was around forking. If the program forks while the RwLock
in the table is held, the child process will deadlock trying to access the table. I think we'd need to just make a new table on fork. Not the end of the world, but I guess just something to look out for.
If the program forks while the RwLock in the table is held, the child process will deadlock trying to access the table. I think we'd need to just make a new table on fork. Not the end of the world, but I guess just something to look out for.
Yeap, this is a good point. I think even without the lock, it's probably not a great idea to do anything other than throw away the table after the fork.
(It would, on the other hand, be cool to have a lock-free implementation for this, but I fear the cost would probably not be great).
Would having a way to clear the table ignoring the lock work as a temporary solution for Python? How do y'all handle this issue for the regular profile data structures? Reset on fork as well?
Attention: Patch coverage is 8.28283%
with 454 lines
in your changes missing coverage. Please review.
Project coverage is 69.86%. Comparing base (
17f36e9
) to head (3cb2cb0
).
What does this PR do?
This PR builds on the work started by @alexjf on #607 to introduce "managed string storage" for profiling.
The idea is to introduce another level of string storage for profiling that is decoupled in lifetime from individual profiles, and that is managed by the libdatadog client.
At its core, managed string storage provides a hashtable that stores strings and returns ids. These ids can then be provided to libdatadog instead of
CharSlice
s when recording profiling samples.For FFI users, this PR adds the following APIs to manage strings:
ddog_prof_ManagedStringStorage_new
ddog_prof_ManagedStringStorage_intern(String)
ddog_prof_ManagedStringStorage_unintern(id)
ddog_prof_ManagedStringStorage_advance_gen
ddog_prof_ManagedStringStorage_drop
ddog_prof_ManagedStringStorage_get_string
A key detail of the current implementation is that each
intern
call with the same string will increase an internal usage counter, andunintern
call with reduce the counter.Then at
advance_gen
time, if the counter is zero, we get rid of the string.Then to interact with profiles, there's a new
ddog_prof_Profile_with_string_storage
API to create a profile with a givenManagedStringStorage
, and all structures that make up aSample
(Mapping
,Function
,Label
) etc have been extended so that they either take aCharSlice
or aManagedStringId
.Thus, after interning all strings for a sample, it's possible to add a sample to a profile entirely by referencing strings by ids, rather than
CharSlice
s.Motivation
The initial use-case is to support heap profiling -- "samples" related to heap profiling usually live across multiple profiles (as long as a given object is alive) and so this data must be kept somewhere. Previously for Ruby we were keeping this on the Ruby profiler side, but having libdatadog manage this instead presents a few optimization opportunities.
We also hope to replace a few other "string tables" that other profilers had to build outside of libdatadog for similar use-cases.
This topic was also discussed in the following two documents (Datadog-only, sorry!):
Additional Notes
In keeping with the experimental nature of this feature, I've tried really hard to not disturb existing profiling API users with the new changes.
That is -- I was going for, if you're not using managed string storage, you should NOT be affected AT ALL by it -- be it API changes or overhead.
(This is why on the pure-Rust profiling crate side, I ended up duplicating a bunch of structures and functions. I couldn't think of a great way to not disturb existing API users other than introducing alternative methods, but to be honest the duplication is all in very simple methods so I don't think this substantially increases complexity/maintenance vs trying to be smarter to bend Rust to our will.)
There's probably a lot of improvements we can make, but with this PR I'm hoping to have something in a close to "good enough" state, that we can merge this in and then start iterating on master, rather than have this continue living in a branch for a lot longer.
This doesn't mean we shouldn't fix or improve things before merging, but I'll be trying to identify what needs to go in now and what can go in as separate, follow-up PRs.
As an addendum, there's still a bunch of
expect
s sprinkled that should be turned into proper errors. I plan to do a pass on all of those. (But again, none of the panics affect existing code, so they're harmless and inert unless you're experimenting with the new APIs)How to test the change?
The branch in https://github.com/DataDog/dd-trace-rb/tree/ivoanjo/prof-9476-managed-string-storage-try2 is where I'm testing the changes on the Ruby profiler side.
It may not be entirely up-to-date with the latest ffi changes on the libdatadog side (I've been prettying up the API), but it shows how to use this concept, while passing all the profiling unit/integration tests, and has shown improvements in memory and latency in the reliability environment.