Closed ywkaras closed 3 months ago
Should this be added to 10.0.x project as well for cherry-pick?
Should this be added to 10.0.x project as well for cherry-pick?
Debatable. Any crashes would happen at shutdown, and seem infrequent (with limited testing in Yahoo Prod).
This is where the segment violation is occurring in Yahoo Prod: https://github.com/apache/trafficserver/blob/5f52c374dfcc3dc223a7c23bd885bac64ec2398f/src/tsutil/Regex.cc#L114
Could also look at how Metrics storage lifetime outlasts the threads and see if this would work for _Data lifetime:
https://github.com/apache/trafficserver/blob/master/src/tsutil/Metrics.cc#L34-L42
Yes, it would not be a logic error in creating automatic, function scope instances of RegexContext when one is needed. The issue is whether the performance hit is tolerable.
On the other hand, this change involves multiple threads/cores hardware locking a variable in cache, which is also a potentially significant performance hit.
Could also look at how Metrics storage lifetime outlasts the threads and see if this would work for _Data lifetime:
https://github.com/apache/trafficserver/blob/master/src/tsutil/Metrics.cc#L34-L42
Because of https://github.com/apache/trafficserver/blob/0a53513961cbe55a6bfde32be208c0e04a70b0f1/src/tsutil/DbgCtl.cc#L161 , RegexContext::get_instance() cannot cause the construction of a thread_local variable with a non-trivial destructor. The Metrics code is not limited by that restriction.
I guess I should look into changing DbgCtl:::_new_reference() so it doesn't call tag_activated() with its mutex locked.
Closing this because I think https://github.com/apache/trafficserver/pull/11416 is better.
These changes should prevent (low-frequency) crashes that are being seen in Yahoo Prod.