[FEATURE] Improve per-cpu map performance

yunwei37 commented 3 weeks ago

Is your feature request related to a problem? Please describe.

The per-cpu map has large overhead compare to kernel, which should be fixed.

Map Operation	Kernel (op - uprobe) (ns)	Userspace (op - uprobe) (ns)
__bench_hash_map_update	62827.533320	30296.051630
__bench_hash_map_lookup	15895.166920	23005.369380
__bench_hash_map_delete	19884.933980	13054.965970
__bench_array_map_update	9538.564600	6701.987970
__bench_array_map_lookup	183.155140	4305.515170
__bench_array_map_delete	216.088950	5987.507820
__bench_per_cpu_hash_map_update	33140.184290	95537.666900
__bench_per_cpu_hash_map_lookup	14089.238230	62913.855920
__bench_per_cpu_hash_map_delete	19753.563580	459826.428910
__bench_per_cpu_array_map_update	8885.238500	25728.928170
__bench_per_cpu_array_map_lookup	1838.737400	8759.420790
__bench_per_cpu_array_map_delete	1867.948100	4802.404130

We need to profile and fix that.

Officeyutong commented 3 weeks ago

Are per cpu maps still using locks and affinity to keep parallel safety?

yunwei37 commented 3 weeks ago

Seems per cpu map does not use locks to keep safety.

The hash map overhead comes from the map implementation. The array map needs more profiling.

eunomia-bpf / bpftime