apple / tensorflow_macos

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
Other
3.66k stars 308 forks source link

Failed in processing TensorFlow graph MLCSubgraphOp_0_9 #218

Open federicolucca opened 3 years ago

federicolucca commented 3 years ago

Hi,

I am working on a version of Mask_rcnn (https://github.com/matterport/Mask_RCNN) on TF2.0 for Apple Silicon. I have converted the project for TF 2.4 and works, i mean there aren't any warning or error in the normal start of the model. After the first epoch the trouble begin, about the conversion of the tensorflow graph in ML compute.

021-03-31 18:05:14.918575: I tensorflow/compiler/tf2mlcompute/kernels/mlc_subgraph_op.cc:326] Compute: Failed in processing TensorFlow graph MLCSubgraphOp_0_9 with frame_id = 0 and iter_id = 0 with error: Internal: AddLayerToMLCGraph: Received nil MLCLayer to be added. (error will be reported 5 times unless TF_MLC_LOGGING=1). 2021-03-31 18:05:36.864282: I tensorflow/compiler/tf2mlcompute/kernels/mlc_subgraph_op.cc:326] Compute: Failed in processing TensorFlow graph proposal_targets/MLCSubgraphOp_0_35 with frame_id = 0 and iter_id = 0 with error: Internal: PreprocessForwardOp: Obtained nil MLCTensor for input 0 into n12 = proposal_targets/sub_37 (Sub), whose parent op: n2 = mlcinput_0_35_0 (_Arg) fails to create a MLCTensor. (error will be reported 5 times unless TF_MLC_LOGGING=1). 2021-03-31 18:05:36.864621: I tensorflow/compiler/tf2mlcompute/kernels/mlc_subgraph_op.cc:326] Compute: Failed in processing TensorFlow graph proposal_targets/MLCSubgraphOp_0_37 with frame_id = 0 and iter_id = 0 with error: Internal: PreprocessForwardOp: Obtained nil MLCTensor for input 0 into n12 = proposal_targets/sub_9 (Sub), whose parent op: n2 = mlcinput_0_37_0 (_Arg) fails to create a MLCTensor. (error will be reported 5 times unless TF_MLC_LOGGING=1). 2021-03-31 18:05:36.873844: I tensorflow/compiler/tf2mlcompute/kernels/mlc_subgraph_op.cc:326] Compute: Failed in processing TensorFlow graph roi_align_mask/MLCSubgraphOp_0_14 with frame_id = 0 and iter_id = 0 with error: Internal: AddLayerToMLCGraph: Failed to add layer to MLCGraph for node: n10 = roi_align_mask/truediv_1 (RealDiv). (error will be reported 5 times unless TF_MLC_LOGGING=1).

After this there is the usual "segmentation fault" and the exit from the training

Someone can help me?

pietmlr commented 3 years ago

@federicolucca Im facing that error too, and I just cannot fix it... I'm wondering why this error isn't fixed yet, seems to be an issue for a lot of people

brunobpr commented 3 years ago

@federicolucca and @redmlr, I got the same error and the only solution was to disable the ML Compute. Try running export TF_DISABLE_MLC=1 on the Terminal.

federicolucca commented 3 years ago

Thanks,it's working. but in the future could be a patch or an update for use ML compute ?

Thanks

brunobpr commented 3 years ago

I hope there will be a better solution. Disabling the MLC is nothing close to ideal, but after days trying to fix it, this was the only thing that worked.

gregdurham commented 3 years ago

Seeing this error as well on a similar machine...is this in the process of being resolved?