Open EnriqueL8 opened 3 years ago
There is a related discussion, but it wasn't going any further and time passed, so it would be nice to discuss it again. https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/288
Withspec.taskManager.{memoryOffHeapRatio,memoryOffHeapMin}
and spec.jobManager.{memoryOffHeapRatio,memoryOffHeapMin}
which are the cause of the problem, operator automatically set taskmanager.heap.size
and jobmanager.heap.size
. These were deprecated at 1.10 and 1.11 respectively, but the default value of the FlinkCluster fields interfered with the new memory configurations *.memory.process.size
.
I also think it's a good idea to remove the default values of offheap related fields to eliminate interference. Then you can clearly guide that those fields are only available in versions 1.9 and 1.10 or lower in the CRD document .
However, *.memory.process.size
is a required configuration, so if there is memory limit value, it seems necessary to set it to the same value. The configuration will be ignored in version 1.9 and below, so there should be no problem.
Agreed, doesn't it make more sense to set *.memmory.process.size
to the memory requested value instead of the memory limit for the container?
Could these environment variables also be deleted? https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/controllers/flinkcluster_converter.go#L117-L134
Even if the pod's request value is set, more memory usage is allowed, so I wonder if it is appropriate to set the request value to the total memory of a Flink process.
And I think about this issue more, it might be better to keep the fields related to off-heap. This is because even if the *.memory.process.size
is configured, the off heap memory usage of the flink process may not be completely limited. So It might be better to still have a buffer.
As of Flink 1.11, the properties for setting the offheap memory have changed and no longer match the ones set by the operator. Docs: https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_migration.html#taskmanager-heap-size. These properties are always set due to the default webhook setting the
offHeapRatio
andoffHeapMin
fields.Has there been any discussions to remove the default memory settings since these would be different for each Flink version? Or to update the function to handle multiple versions? IMO harder to handle multiple versions, and easier to let the user configure all the properties directly to match the Flink version they are using.
There is a TODO comment in the API to watch for https://issues.apache.org/jira/browse/FLINK-13980 since this has been resolved I believe these fields are no longer needed.