apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.65k stars 597 forks source link

[feature] add memory monitoring and profiling policy to healthmgr #2454

Open huijunw opened 6 years ago

huijunw commented 6 years ago

We observed sometimes stmgr out of memory and heron-instance out of memory. But the containers were restarted by the scheduler and the we did not have chance to profile the process memory.

Propose a feature to monitor and profile process memory policy on top of dhalion/healthmgr framework. detector: monitor stmgr/heon-instance memory diagnoser: if the process memory is too high, trigger resolver resolver: start process memory profiling for 1 min overwriting the last profile. This policy keeps a last copy of process memory profile before scheduler restarts the container.

thoughts? @ashvina @avflor @srkukarni @objmagic @maosongfu

huijunw commented 6 years ago

relates to https://github.com/twitter/heron/pull/2005