flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.67k stars 224 forks source link

Assertion error in MLP_unify #339

Open lockshaw opened 1 year ago

lockshaw commented 1 year ago

To reproduce: ./mlp_unify -ll:gpu 1 -ll:fsize 14000 -ll:zsize 14000 --budget 20 --search-num-nodes 4 --search-num-workers 2

Output:

[0 - 7fd6c0699000]    0.267443 {3}{Mapper}: Enabled Control Replication Optimizations.
[0 - 7fd6c0699000]    0.267524 {3}{Mapper}: Enabled Control Replication Optimizations.
batchSize(64) workersPerNodes(1) numNodes(1)
workSpaceSize (1024 MB)
num_nodes = 4 num_gpus_per_node = 2
mlp_unify: /home/mengdiwu/FlexFlow/src/runtime/graph.cc:486: bool FlexFlow::PCG::Graph::check_correctness(): Assertion `srcTensor->dims[i] == dstTensor->dims[i]' failed.
Aborted (core dumped)

Potentially caused by #298.

reyna-abhyankar commented 1 year ago

Status: blocked on NoOp simulator fix

lockshaw commented 1 year ago

i.e., #346

lockshaw commented 1 year ago

May potentially be closed by #622 and #680 (which close #346), so will need to be checked again after #622 and #680 are merged