Closed pradiptapks closed 2 years ago
@portante in perf122, I am downgrading pbench to continue my binary-search test. please let me know once there is fix available on this.
Our first attempt at a fix for this was in PR #2071, which was flawed in that we caused other problems as a result of that change.
Our second attempt is now in PR #2090, where we are working first against the b0.69
branch to methodically (via lots of small PRs) massage the code into a better state to address the issue. PR #2090 addresses the same problem as PR #2071 by back-porting the original proposed fixed, while working to ensure the rest of the code does not break anything.
Further work will be required in order to ensure the generated result.json
files in the iteration hierarchies are valid and work with the dashboard code.
Our first attempt at a fix for this was in PR https://github.com/distributed-system-analysis/pbench/pull/2071
This report looks like an incompatibility between Pbench and Trafficgen on the invocation side, not on the output interpretation side (which is what I think #2071 addresses): it looks like the benchmark script is specifying --rate-unit=mpps
to the traffic generator and it is failing as a result, which then causes knock-on effects, like the output directory not existing.
However, the logging indicates that pbench-trafficgen
was invoked with --rate-unit=%
, so I'm not sure how it ended up falling back on the default "mpps"
, but that's a second problem.
TrafficGen is no longer supported starting in v0.71 and later.
Using the pbench test repo, I updated the pbench-agent package. But while test pbench-trafficgen execution failed where it indicated tools-default directory doesn't exist.
Platform: Red Hat Enterprise Linux release 8.2 (Ootpa)
Reproduced Steps:
time pbench-trafficgen --traffic-generator=trex-txrx-profile \ --devices=$PCI_INFO --traffic-profile=$PROFILE --rate=$RATE --rate-unit=$UNIT \ --samples=$SAMPLE --max-loss-pct=$LOSS --config=$CONFIG \ --tool-period=binary-search --skip-git-pull \ --search-runtime=$SEARCH_TIME --validation-runtime=$VALIDATION_TIME \ -- --rate-tolerance-failure=fail --disable-upward-search \ --loss-granularity=segment 2>&1 | tee $LOG
<..trim..> trex-server is ready
Total number of benchmark iterations: 1 Starting iteration[1-psahoo-profile-bs.json-0.0pct_drop] (1 of 1) test sample 1 of 1 [pbench-tool-trigger] starting trigger processing of STDIN using tool group default triggers at /var/lib/pbench-agent/tools-v1-default/trigger [pbench-tool-trigger] start-trigger:"Starting binary-search" stop-trigger:"Finished binary-search" [2021-02-02 03:01:00.439716][BSO] Namespace(active_device_pairs='0:1,2:3,4:5,6:7', device_pairs='0:1,2:3,4:5,6:7', disable_upward_search=True, dst_ips='', dst_macs='', dst_ports='', duplicate_packet_failure_mode='quit', enable_flow_cache= True, enable_segment_monitor=False, enable_trex_profiler=True, encap_dst_ips='', encap_dst_macs='', encap_src_ips='', encap_src_macs='', frame_size='64', latency_device_pair='--', latency_rate=1000, loss_granularity='segment', max_loss_pc t=0.0, max_retries=1, measure_latency=1, min_rate=0.0, negative_packet_loss_mode='quit', no_promisc=False, num_flows=1024, one_shot=0, output_dir='/var/lib/pbench-agent/trafficgen_Trial_tg:trex-profile_pf:psahoo-profile-bs.json_ml:0.0_tt: bs_2021-02-02T03:00:29/1-psahoo-profile-bs.json-0.0pct_drop/sample1', packet_protocol='UDP', pre_trial_cmd='', process_all_profiler_data=False, random_seed=0.3089222808406439, rate=0.0, rate_tolerance=3.0, rate_tolerance_failure='fail', r ate_unit='mpps', repeat_final_validation=False, runtime_tolerance=5, search_granularity=0.1, search_runtime=60, send_teaching_measurement=False, send_teaching_warmup=False, sniff_runtime=30, src_ips='', src_macs='', src_ports='', stream_m ode='continuous', teaching_measurement_interval=10.0, teaching_measurement_packet_rate=1000, teaching_measurement_packet_type='', teaching_warmup_packet_rate=1000, teaching_warmup_packet_type='', traffic_direction='bidirectional', traffic _generator='trex-txrx-profile', traffic_profile='/var/lib/pbench-agent/trafficgen_Trial_tg:trex-profile_pf:psahoo-profile-bs.json_ml:0.0_tt:bs_2021-02-02T03:00:29/1-psahoo-profile-bs.json-0.0pct_drop/psahoo-profile-bs.json', trex_host='lo calhost', trex_profiler_interval=3.0, trial_gap=0, use_device_stats=False, use_dst_ip_flows=1, use_dst_mac_flows=1, use_dst_port_flows=0, use_encap_dst_ip_flows=0, use_encap_dst_mac_flows=0, use_encap_src_ip_flows=0, use_encap_src_mac_fl$ ws=0, use_protocol_flows=0, use_src_ip_flows=1, use_src_mac_flows=1, use_src_port_flows=0, validation_runtime=120, vlan_ids='', vxlan_ids='', warmup_traffic_profile='', warmup_trial=False, warmup_trial_runtime=30) [2021-02-02 03:01:00.439949][BSO] The trex-txrx-profile traffic generator does not support --rate-unit=mpps [error][2021-02-02T03:01:00.458058888] iteration 1-psahoo-profile-bs.json-0.0pct_drop sample 1 returned non-zero exit code - 1 [error][2021-02-02T03:01:00.467098421] [pbench-stop-tools] expected tool output directory, "/var/lib/pbench-agent/trafficgen_Trial_tg:trex-profile_pf:psahoo-profile-bs.json_ml:0.0_tt:bs_2021-02-02T03:00:29/1-default/sample1/tools-default" , does not exist
tool triggers did not fire for iteration/sample, '1-psahoo-profile-bs.json-0.0pct_drop/sample1' [error][2021-02-02T03:01:00.470171817] Aborting benchmark
killing existing trex server
real 0m36.902s