flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.67k stars 224 forks source link

`sink_nodes.size() == 1` assertion fails #356

Open goliaro opened 1 year ago

goliaro commented 1 year ago

I'm currently getting the error below when running the print_layers.py example as part of the python/test.sh test script:

++ ./flexflow_python /usr/FlexFlow/examples/python/native/print_layers.py -ll:py 1 -ll:gpu 1 -ll:fsize 4096 -ll:zsize 12192 --epochs 5
[0 - 7f39433c6700]    0.395783 {3}{Mapper}: Enabled Control Replication Optimizations.
[0 - 7f39433c6700]    0.395876 {3}{Mapper}: Enabled Control Replication Optimizations.
[0 - 7f39433c6700]    0.395901 {3}{Mapper}: Enabled Control Replication Optimizations.
Using cffi flexflow bindings.
Using flexflow python
start top-level task
top-level task
alexnet
Python API batchSize(64) workersPerNodes(1) numNodes(1)
workSpaceSize (1024 MB)
num_nodes = 1 num_gpus_per_node = 1
flexflow_python: /usr/FlexFlow/src/runtime/substitution.cc:677: FlexFlow::PCG::Node FlexFlow::PCG::Graph::find_sink_node() const: Assertion `sink_nodes.size() == 1' failed.
./test.sh: line 64: 1011460 Aborted                 (core dumped) ./flexflow_python $FF_HOME/examples/python/native/print_layers.py -ll:py 1 -ll:gpu $GPUS -ll:fsize $LEGION_FSIZE_SMALL -ll:zsize $LEGION_ZSIZE --epochs 5
lockshaw commented 1 year ago

I'll take a look at this when I get some time