Closed GammaPi closed 1 year ago
We should maintain a thread attribution clock.
This thread attribution clock cannot be applied during every API invocation. Because doing so will incur expensive atomic operation. We must tolerate some inaccuracies.
Another source of inaccuracy comes from thread sleep.
This approach attributes API runtime by tracking an active thread counter (attributed at API ending time). However, we cannot do the same for application. The application can only be attributed at thread creation/termination time. Such difference will cause inaccurate results. As shown in previous image.
To make two attributions consistent. We need to perform the same strategy for APIs as application, as shown in previous image.
However, doing so will need more synchronization. If the synchronization overhead is large we cannot sell this work. There are several possible ways to solve this: https://stackoverflow.com/questions/61237650/a-readers-writer-lock-without-having-a-lock-for-the-readers
We currently paused the implementation due to time concerns. Another reason is that Scaler is inherently inaccurate (because of thread_sleep?!) so there is no need to implement an accurate attribution approach.
The approach simply classifies parallel and serial phase and scale parallel phase with a phase-dependent number calculated based on the thread count.
Detailed implementation is described as follows:
The main problem of this approach is inaccuracy.
Outlier removal is currently removed because it's a very heuristic approach. Thread attribution approach 2 has just been implemented.
Approach 1 has been implemented succesfully. The verification of approach 1.
Approach1: Centralized counter