Open dmonakhov opened 5 months ago
I use aws-platform.yaml config file for passing platform characteristics which are never changes:
version: AWS-0.1 spec: dcgm-diag-v1 skus: - name: NVIDIA H100 80GB HBM3 p5.48xlarge id: 2330 pcie: is_allowed: true h2d_d2h_single_pinned: min_pci_generation: 5.0 min_pci_width: 16.0 min_bandwidth: 14.0 max_latency: 5 h2d_d2h_single_unpinned: min_pci_generation: 5.0 min_pci_width: 16.0 min_bandwidth: 14.0 gpu_nvlinks_expected_up: 18 nvswitch_nvlinks_expected_up: 6
But also want to customize other parameters like test_duration and use --parameters option for this:
dcgmi diag --verbose --json --configfile diag-aws.yaml --run long --parameters memtest.test_duration=120
But it is appeared that --configfile options will be silently ignored if --parameters option is present. And nvvs will called in configless mode:
/usr/share/nvidia-validation-suite/nvvs -j -z --specifiedtest long --parameters memtest.test_duration=120 --configless -v --indexes 0,1,2,3,4,5,6,7
Which is very cont intuitive and makes it hard to quick parameters prototyping, because either configfile, or parameters should be used. And passing all system parameters with --parameters seems not very practical.
I use aws-platform.yaml config file for passing platform characteristics which are never changes:
But also want to customize other parameters like test_duration and use --parameters option for this:
But it is appeared that --configfile options will be silently ignored if --parameters option is present. And nvvs will called in configless mode:
Which is very cont intuitive and makes it hard to quick parameters prototyping, because either configfile, or parameters should be used. And passing all system parameters with --parameters seems not very practical.