GoogleCloudPlatform / flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Apache License 2.0
657 stars 265 forks source link

Update/remove offheap memory settings #411

Open EnriqueL8 opened 3 years ago

EnriqueL8 commented 3 years ago

As of Flink 1.11, the properties for setting the offheap memory have changed and no longer match the ones set by the operator. Docs: https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_migration.html#taskmanager-heap-size. These properties are always set due to the default webhook setting the offHeapRatio and offHeapMin fields.

Has there been any discussions to remove the default memory settings since these would be different for each Flink version? Or to update the function to handle multiple versions? IMO harder to handle multiple versions, and easier to let the user configure all the properties directly to match the Flink version they are using.

There is a TODO comment in the API to watch for https://issues.apache.org/jira/browse/FLINK-13980 since this has been resolved I believe these fields are no longer needed.

elanv commented 3 years ago

There is a related discussion, but it wasn't going any further and time passed, so it would be nice to discuss it again. https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/288 Withspec.taskManager.{memoryOffHeapRatio,memoryOffHeapMin} and spec.jobManager.{memoryOffHeapRatio,memoryOffHeapMin} which are the cause of the problem, operator automatically set taskmanager.heap.size and jobmanager.heap.size. These were deprecated at 1.10 and 1.11 respectively, but the default value of the FlinkCluster fields interfered with the new memory configurations *.memory.process.size.

I also think it's a good idea to remove the default values of offheap related fields to eliminate interference. Then you can clearly guide that those fields are only available in versions 1.9 and 1.10 or lower in the CRD document .

However, *.memory.process.size is a required configuration, so if there is memory limit value, it seems necessary to set it to the same value. The configuration will be ignored in version 1.9 and below, so there should be no problem.

EnriqueL8 commented 3 years ago

Agreed, doesn't it make more sense to set *.memmory.process.size to the memory requested value instead of the memory limit for the container?

EnriqueL8 commented 3 years ago

Could these environment variables also be deleted? https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/controllers/flinkcluster_converter.go#L117-L134

elanv commented 3 years ago

Even if the pod's request value is set, more memory usage is allowed, so I wonder if it is appropriate to set the request value to the total memory of a Flink process.

And I think about this issue more, it might be better to keep the fields related to off-heap. This is because even if the *.memory.process.size is configured, the off heap memory usage of the flink process may not be completely limited. So It might be better to still have a buffer.