apache / brpc

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" means "better RPC".
https://brpc.apache.org
Apache License 2.0
16.52k stars 3.97k forks source link

cpuwide_time_ns函数中转换tsc为时间时使用的频率是否不准确? #2409

Open zhaixiaojuan opened 1 year ago

zhaixiaojuan commented 1 year ago

在cpuwide_time_ns函数中使用tsc计算时间时,使用的频率不匹配,具体描述如下: 在cpuwide_time_ns函数中有两种计算时间的方式: (1)调用clock_gettime(即函数monotonic_time_ns)来获取时间 (2)使用 tsc/cpu_freq 来计算时间,存在以下两个疑问: a. cpu_freq是从函数read_cpu_frequency中获取的,在该函数中从/proc/cpuinfo中读取了CPU的频率,虽然在代码中通过constant_tsc和nonstop_tsc的保证了tsc的恒定频率,但是通过查阅网上资料发现tsc的频率与cpu频率无直接关系,所以这里使用的频率并不准确。 b. read_cpu_frequency函数中只读取了/proc/cpuinfo中第一个"cpu MHz",在多处理器上(即使是constant_tsc和nonstop_tsc都存在的情况下), cpu MHz的频率也可能是存在差异的,以下是x86 debian-12环境下多处理器的cpu MHz值,可以看到processor 0和1的cpu MHz 约为3800,但是processor 9的值为2900。 rdtsc指令获取的是当前处理器的时间戳计数器的值,当前使用的处理器也不一定就是processor 0

root@amd64-14:/home/data# uname -a
Linux amd64-14 6.1.0-12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.52-1 (2023-09-07) x86_64 GNU/Linux
root@amd64-14:/home/data# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@amd64-14:/home/data# cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 165
model name  : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping    : 5
microcode   : 0xf8
cpu MHz     : 3799.996
cache size  : 16384 KB
physical id : 0
siblings    : 16
core id     : 0
cpu cores   : 8
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp pku ospke md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds
bogomips    : 5799.77
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 165
model name  : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping    : 5
microcode   : 0xf8
cpu MHz     : 3800.000
cache size  : 16384 KB
physical id : 0
siblings    : 16
core id     : 1
cpu cores   : 8
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp pku ospke md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds
bogomips    : 5799.77
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
......
processor   : 9
vendor_id   : GenuineIntel
cpu family  : 6
model       : 165
model name  : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
stepping    : 5
microcode   : 0xf8
cpu MHz     : 2900.000
cache size  : 16384 KB
physical id : 0
siblings    : 16
core id     : 1
cpu cores   : 8
apicid      : 3
initial apicid  : 3
fpu     : yes
fpu_exception   : yes
cpuid level : 22
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp pku ospke md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds
bogomips    : 5799.77
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

是否可以统一考虑使用clock_getttime来获取时间

wwbmmm commented 1 year ago

改成clock_gettime对性能是否有损?cpuwide_time的使用频率还挺高的

Ryan-Git commented 8 months ago

intel cpu 的话,linux 启动的时候会做跨 cpu 校准的,第三个基本不是问题。 频率直接用 cpu 的确实可能略有差距,手边的机器大概是 2200.000 vs 2194.843 可以参考 dpdk 的做法读 cpuid。