Open lockshaw opened 1 year ago
To reproduce: ./mlp_unify -ll:gpu 1 -ll:fsize 14000 -ll:zsize 14000 --budget 20 --search-num-nodes 4 --search-num-workers 2
./mlp_unify -ll:gpu 1 -ll:fsize 14000 -ll:zsize 14000 --budget 20 --search-num-nodes 4 --search-num-workers 2
Output:
[0 - 7fd6c0699000] 0.267443 {3}{Mapper}: Enabled Control Replication Optimizations. [0 - 7fd6c0699000] 0.267524 {3}{Mapper}: Enabled Control Replication Optimizations. batchSize(64) workersPerNodes(1) numNodes(1) workSpaceSize (1024 MB) num_nodes = 4 num_gpus_per_node = 2 mlp_unify: /home/mengdiwu/FlexFlow/src/runtime/graph.cc:486: bool FlexFlow::PCG::Graph::check_correctness(): Assertion `srcTensor->dims[i] == dstTensor->dims[i]' failed. Aborted (core dumped)
Potentially caused by #298.
Status: blocked on NoOp simulator fix
NoOp
i.e., #346
May potentially be closed by #622 and #680 (which close #346), so will need to be checked again after #622 and #680 are merged
To reproduce:
./mlp_unify -ll:gpu 1 -ll:fsize 14000 -ll:zsize 14000 --budget 20 --search-num-nodes 4 --search-num-workers 2
Output:
Potentially caused by #298.