apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.65k stars 597 forks source link

Memory allocation issue with setComponentRam() assigned between 192 - 256 MB #2803

Open aahmed-se opened 6 years ago

aahmed-se commented 6 years ago

setting up a topology results in an error within heron

com.twitter.heron.api.Config.setComponentRam(conf, "sentence", ByteAmount.fromMegabytes(192));

the problem only seems to be with memory range 192-256MB .

heron_internals.yaml --override_config_file=./heron-conf/override.yaml --ckptmgr_port=58984 --ckptmgr_id=ckptmgr-1
[2018-03-22 14:42:05 -0700] [INFO]: container_1_split_2 stdout: Invalid maximum heap size: -Xmx0M
[2018-03-22 14:42:05 -0700] [INFO]: container_1_split_2 stdout: Error: Could not create the Java Virtual Machine.
[2018-03-22 14:42:05 -0700] [INFO]: container_1_split_2 stdout: Error: A fatal exception has occurred. Program will exit.
[2018-03-22 14:42:05 -0700] [INFO]: container_1_split_2 stdout: 
[2018-03-22 14:42:05 -0700] [INFO]: Logging pid 94266 to file stmgr-1.pi
aahmed-se commented 6 years ago

this seems to be a problem with ResourceCompliantRRPacking , instances with ram config between 192 - 256 MB generate 0xmx for the instance plan , the problem seems to be with PackingUtils.java

public static Resource computeTotalResourceChange(TopologyAPI.Topology topology,
                                                    Map<String, Integer> componentChanges,
                                                    Resource defaultInstanceResources,
                                                    ScalingDirection scalingDirection) {
    double cpu = 0;
    ByteAmount ram = ByteAmount.ZERO;
    ByteAmount disk = ByteAmount.ZERO;
    Map<String, ByteAmount> ramMap = TopologyUtils.getComponentRamMapConfig(topology);
    Map<String, Integer> componentsToScale = PackingUtils.getComponentsToScale(
        componentChanges, scalingDirection);
    for (String component : componentsToScale.keySet()) {
      int parallelismChange = Math.abs(componentChanges.get(component));
      cpu += parallelismChange * defaultInstanceResources.getCpu();
      disk = disk.plus(defaultInstanceResources.getDisk().multiply(parallelismChange));
      if (ramMap.containsKey(component)) {
        ram = ram.plus(ramMap.get(component).multiply(parallelismChange));
      } else {
        ram = ram.plus(defaultInstanceResources.getRam().multiply(parallelismChange));
      }
    }
    return new Resource(cpu, ram, disk);
  }
ashvina commented 6 years ago

This issue is not caused by RCRR packing.

I launched a topology with 192 ram allocated to a bolt counter. I found the following in the executor logs.

[INFO]: component name: counter, ram request: 201326592, total jvm size: 192M, cache size: 64M, metaspace size: 128M

The executor correctly received ram size. Which means packing worked as expected.

However the issue still exists, because the process fails to launch:

[INFO]: container_1_counter_3 stdout: Invalid maximum heap size: -Xmx0M

The culprit is https://github.com/twitter/heron/blob/master/heron/executor/src/python/heron_executor.py#L562

    # TO DO (Karthik) to be moved into keys and defaults files
    code_cache_size_mb = 64
    java_metasize_mb = 128
...
      total_jvm_size = int(self.component_ram_map[component_name] / (1024 * 1024))
      heap_size_mb = total_jvm_size - code_cache_size_mb - java_metasize_mb
      Log.info("component name: %s, ram request: %d, total jvm size: %dM, "
               "cache size: %dM, metaspace size: %dM"
               % (component_name, self.component_ram_map[component_name],
                  total_jvm_size, code_cache_size_mb, java_metasize_mb))

@kramasamy may know more about the reason.

@jerrypeng @nwangtw, you may want look at this issue before adding any new configs to correctly estimate jvm size.