Don't register RemoteShuffleResultPartition to partition manager because we don't store resource in TM.
In theory, we could also fix this on the Flink side by no longer registering partitions to the partition manager for remote/cluster partitions, but that will only be released after Flink 2.0 at the soon.
Why are the changes needed?
RemoteShuffleResultPartition will be registered to the partition manager(at setup phase). Since it's a cluster partition(resources are not stored on the Flink TM), Flink does not trigger the resource releasing over TM.
In a session cluster, the partition object is leaked. As a more serious consequence, the failure of the partition to release will result in the idle TM not being reclaimed by the Flink resource manager.
What changes were proposed in this pull request?
Don't register
RemoteShuffleResultPartition
to partition manager because we don't store resource in TM.In theory, we could also fix this on the Flink side by no longer registering partitions to the partition manager for remote/cluster partitions, but that will only be released after Flink 2.0 at the soon.
Why are the changes needed?
RemoteShuffleResultPartition
will be registered to the partition manager(at setup phase). Since it's a cluster partition(resources are not stored on the Flink TM), Flink does not trigger the resource releasing over TM.In a session cluster, the partition object is leaked. As a more serious consequence, the failure of the partition to release will result in the idle TM not being reclaimed by the Flink resource manager.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Self tested in a session cluster.