futurewei-cloud / alcor

Alcor: Cloud native SDN platform powered by Kubernetes and Istio
MIT License
32 stars 33 forks source link

Warm up gRPC channels and add log for NCM performance analysis #670

Closed zzxgzgz closed 3 years ago

zzxgzgz commented 3 years ago

This PR does the following:

  1. Added logs in NCM, so that it can be used for performance analysis, along with ACA's log and Test Controller's log.
  2. Added code for NCM's gRPC channels so that users can configure how many gRPC channels NCM establishes to each ACA, also how many warmup GoalStates each channel will send when it is first instantiated.

Notes:

  1. To enable the logging for performance, please change the logging.level.root in services/network_config_manager/src/main/resources/application.properties to DEBUG, the DEBUG level equals the Level.FINE in the java code.
  2. When a coder sets some values in application.properties, in order to retrive the values correctly, you need to retrieve them in a class that has either @Component or @Bean, otherwise, it doesn't work and the value is always the default value of that class, such as 0 for an int, or null for other Objects.
lgtm-com[bot] commented 3 years ago

This pull request introduces 22 alerts when merging bcd0d030de447d03414e246d46e2c91ef3fbd40b into 559f2bd90ec4273dfef10024e657739bd4fb6c15 - view on LGTM.com

new alerts:

zzxgzgz commented 3 years ago

Adding a sample output of the analysis for your reference:

$ python3 analyze.py ./aca.log ./ncm_log_x.log ./tc.log 
Hi
Argument: analyze.py
Argument: ./aca.log
Argument: ./ncm_log_x.log
Argument: ./tc.log
This file has 26497 lines
For keyword [Elapsed time for update goalstate operation took:], there are 33 lines
For On-demand GS process time, the min is 0.0 ms, max is 60.0 ms, mean is 19.000 ms, median is 21.0
For keyword [T3], there are 20 lines
For On-demand call send to receive time (T3 - T1), the min is 92.0 ms, max is 173.0 ms, mean is 146.789 ms, median is 148.0
For keyword [[METRICS] Elapsed time for port operation took:], there are 33 lines
For Port GS Process Time, the min is 0.0 ms, max is 59.0 ms, mean is 7.606 ms, median is 8.0
For keyword [[METRICS] Elapsed time for neighbor operation took:], there are 33 lines
For Neighbor GS Process Time, the min is 0.0 ms, max is 49.0 ms, mean is 9.788 ms, median is 10.0
This file has 8794 lines
For keyword [From received hostOperation to before response], there are 20 lines
For On-demand call received to response sent time (~T3 - T2), the min is 60.0 ms, max is 125.0 ms, mean is 108.850 ms, median is 111.0
For keyword [retrieved vpc resource metadata, elapsed Time in milli seconds:], there are 20 lines
For Retrieve VPC resource metadata time, the min is 5.0 ms, max is 20.0 ms, mean is 16.050 ms, median is 17.0
This file has 2977 lines, total pings: 20, successful_pings: 20, failed pings: 0
For keyword [bytes from], there are 20 lines
For Ping Speed, the min is 95.865 ms, max is 188.29 ms, mean is 155.183 ms, median is 158.5075
Ping speed is as follows:
150.454
95.865
169.558
167.035
128.88
128.794
151.262
175.191
170.386
173.43
131.117
171.236
188.29
183.667
147.877
171.802
165.753
148.696
137.401
146.969
Done printing ping speed
This ACA log file has 20 neighbor IDs
This ACA log file has 20 on-demand call UUIDs
For host operation request grpc delay, the min is 0.0 ms, max is 25.0 ms, mean is 12.150 ms, median is 10.0
For host operation reply grpc delay, the min is 22.0 ms, max is 30.0 ms, mean is 25.100 ms, median is 24.0
There are totally 20 unique neighbor IDs, and 20 of them appear both in ACA and NCM log
For ncm push goalstate grpc delay, the min is 2.0 ms, max is 21.0 ms, mean is 7.400 ms, median is 3.5
For ACA received ncm reply to packet out time, the min is 1.0 ms, max is 15.0 ms, mean is 6.150 ms, median is 4.5