Open Eevan-zq opened 1 month ago
New msccl-tool fix this error.
by the way,
1: when I run this command:
the xml header is Why are minBytes and maxBytes equal to 0? Will it have any impact?
2: And the following will appear at the end of this XML file: This may be due to an error in the final Check validation in allreduce_a100_pcie_hierarchical.py:
I am currently unsure if the XML file generated by running python ./allreduce_a100_pcie_hierarchical.py --protocol=LL 8 1 > test.xml
is correct?
Why wasn't the method I generated using msccl-tools from the XML invoked when I executed the command :
and I check the code here: I find status.algoMetas.size() = 0 and then I trace here:
I find all .xml files that generated by msccl-tools don't containts minBytes, is this the reason why the algorithm included in the XML wasn't scheduled when I executed the mpirun command? If so, what should I do?