Rob will implement a solution for intrusive network performance testing across our WAN. Once design decisions are made we'll close this issue and open a new issue for deployment.
W, 20140716: my initial notes before meeting with Rob:
TESTING
scripts should conduct exhaustive point-to-point iperf testing among our hosts; an evolution of the testing represented in this manually-constructed test matrix
capture errors and throughput from our NICs
within each test epoch the number of simultaneous iperf tests across WAN links/equipment should be systematically varied to find problems that interact with WAN capacity
within-switch tests should be included even though they aren't ecologically useful: they demonstrate ability of our NICs to saturate their individual links, ruling them out as trouble sources
frequency: ??
minimum duration of each point-to-point test: > 1hr?
we'll add NFS testing once the test system is working with iperf data
WAN sites to start: GNAX, WMB, 1599, VA 12th floor
VISUALIZATION
some dimension-reduced version of results should aggregated and alarmed in zabbix (e.g., percent of "failed" point-to-point tests, maybe with grouping variables of 1) WAN site and/or 2) number of simultaneous connections
scripts should generate a detailed matrix of data capturing each test epoch, allowing for detailed snapshot of a single WAN test to be sent to network providers with pretty much all of the information they need to check against their logs; again, some improved version of this manually-constructed test matrix
zabbix time-series plots and the detailed matrices should hyperlink to each other
Rob will implement a solution for intrusive network performance testing across our WAN. Once design decisions are made we'll close this issue and open a new issue for deployment.