Closed ragmani closed 2 months ago
This issue may related to #13223, because the above model worked well in the past.
After #13514
The log about linearized nodes
[ ExecutorFactory] Linearize for forwarding order
[ Linearize ] %29 = @10_Permute( %0)
[ Linearize ] %3 = @0_Conv2D( %29, %1, %28)
[ Linearize ] %5 = @1_Conv2D( %3, %4, %2)
[ Linearize ] %8 = @2_Conv2D( %5, %6, %27)
[ Linearize ] %10 = @3_Conv2D( %8, %9, %7)
[ Linearize ] %13 = @4_Conv2D( %10, %11, %26)
[ Linearize ] %15 = @5_Conv2D( %13, %14, %12)
[ Linearize ] %18 = @6_Conv2D( %15, %16, %25)
[ Linearize ] %20 = @7_Conv2D( %18, %19, %17)
[ Linearize ] %22 = @8_Reshape( %20, %21)
[ Linearize ] %30 = @11_Permute( %23)
[ Linearize ] %24 = @9_MeanSquaredErrorLoss( %22, %30)
[ Linearize ] %31 = @12_Permute( %22)
[ ExecutorFactory] Linearize for backwarding order
[ Linearize ] %24 = @9_MeanSquaredErrorLoss( %22, %30)
[ Linearize ] %22 = @8_Reshape( %20, %21)
[ Linearize ] %20 = @7_Conv2D( %18, %19, %17)
[ Linearize ] %18 = @6_Conv2D( %15, %16, %25)
[ Linearize ] %15 = @5_Conv2D( %13, %14, %12)
[ Linearize ] %13 = @4_Conv2D( %10, %11, %26)
[ Linearize ] %10 = @3_Conv2D( %8, %9, %7)
[ Linearize ] %8 = @2_Conv2D( %5, %6, %27)
[ Linearize ] %5 = @1_Conv2D( %3, %4, %2)
[ Linearize ] %3 = @0_Conv2D( %29, %1, %28)
The log by ConstantInsertionPass
[ PassRunner ] Start running 'ConstantInsertionPass'
[ ConstInsertPass] New operand %25 added(copy of %17) for (train/NHWC)
[ ConstInsertPass] New operand %26 added(copy of %12) for (train/NHWC)
[ ConstInsertPass] New operand %27 added(copy of %7) for (train/NHWC)
[ ConstInsertPass] New operand %28 added(copy of %2) for (train/NHWC)
[ PassRunner ] Finished running 'ConstantInsertionPass'
But #13514 has memory issues in inference, it's going to be applied to training only.
Done.
What
Let's enable
ConstantInsertionPass
again for training of models that share a constant data in multiple operation nodes.Why
There are some models that share a constant operand like the Conv2D nodes
(@0, @1), (@2, @3), (@4, @5), (@6, @7)
in the below log.For training, all constants must be distinguished into separate operands because constant operands can be updated if they are training parameters(weight, bias or etc). However, currently
ConstantInsertionPass
does not add new operands for them.ConstantInsertionPass
PR
13514
13517