apache / incubator-heron

Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
https://heron.apache.org/
Apache License 2.0
3.65k stars 597 forks source link

Add support for JVM debugging port #1936

Open billonahill opened 7 years ago

billonahill commented 7 years ago

When developing in Heron I often want to connect a java debugger to the process. We should make it easy to enable the debug port for both the submitter, the runtime manager and the instance processes. Current approach is to modify the various shell commands that start these processes to include java debug options.

srkukarni commented 7 years ago

@billonahill I thought we already can get both jstack/jmap information from heron-shell. Could you please explicitly specify the capability that you need?

maosongfu commented 7 years ago

For aurora scheduler, it seems we have that port already: https://github.com/twitter/heron/blob/master/heron/config/src/yaml/conf/aurora/heron.aurora#L31

srkukarni commented 7 years ago

This is very inconsistent behaviour. Why do we have things for one scheduler and not for other?

billonahill commented 7 years ago

@srkukarni some schedulers might support different features than others w.r.t. the ability to support profiling, heap/thread dumping, etc. Ideally they would all support these things but we don't require that they all do currently.

That's a separate issue though than what I was referring to. I want to be able to step through code in a debugger in my IDE, which means being able to insert JVM args like -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005 into the java command for example.

Here's how I've manually hacked into the heron_executor.py _get_java_instance_cmd method (in my working branch) to make a java instance start with these args: https://github.com/twitter/heron/pull/1807/files#diff-2a669e084143b86b5766a49245f8e357

                       '-XX:+UseConcMarkSweepGC',
                       '-XX:ParallelGCThreads=4',
-                      '-Xloggc:log-files/gc.%s.log' % instance_id]
+                      '-Xloggc:log-files/gc.%s.log' % instance_id.replace("$", "")]
+      if global_task_id == 3: # Used to enable debugging of a specific instance
+        instance_cmd =\
+            instance_cmd + ["-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005"]
+
       instance_cmd = instance_cmd + self.instance_jvm_opts.split()
       if component_name in self.component_jvm_opts:
         instance_cmd = instance_cmd + self.component_jvm_opts[component_name].split()

Being able to pass an argument or probably to set an environment variable when submitting a topology to have this effect would be really helpful when developing/debugging in localmode.

srkukarni commented 7 years ago

But don;'t we already have ways to add more parameters to the jvm? TOPOLOGY_WORKER_CHILDOPTS and TOPOLOGY_COMPONENT_JVMOPTS

billonahill commented 7 years ago

@srkukarni I suspect we could inject the params with those settings, but we'd also need to support injecting them only on the specific instance that we want to debug, probably by instance id. We wouldn't want them to all start with debug options since a.) the user would then need to connect to each one to allow them to start; and b.) we'd have port collisions.

Also, this ticket captures adding debug support to the submitter, runtime manager, and the scheduler-as-a-library as well.

srkukarni commented 7 years ago

You are right that we don't have any way to pass instance specific command line params, but by defn the instance id/task id not something known a priori and thus this cannot be part of api per se. Having said that isn't the purpose of heron-shell to do these kind of profiler/debugger attachments? I also miss what you mean by 'adding debug support to the submitter, runtime manager, and the scheduler-as-a-library'. A little more specific use would help me understand. Thanks!

billonahill commented 7 years ago

To attach a remote debugger (i.e., and IDE) to a java process for debugging, you must start the JVM with specific debug options. Hence the heron shell is not applicable here, we must actually start the instance processes with arguments injected.

When a topology is submitted or updated, a number of java processes get started: SubmitterMain, RuntimeManagerMain and SchedulerMain to deploy/update the topology and then the actual HeronInstances that make up the topology. While developing or debugging locally, it's often convenient to attach a debugger to any of these java processes.

That's what this ticket is for, which is making it easy to do so without having to modify source code to inject the arguments. For example to attach a debugger to SchedulerMain running locally currently you need to modify the code in schedulerCommand.schedulerCommand(..). It would be great if instead I could set a local env variable that the Heron CLI picks up to make this happen.

srkukarni commented 7 years ago

There are two kinds of people. 1) Heron Users, that is topology writers. For them the current TOPOLOGY_COMPONENT_JVMOPTS approach should work right? 2) Heron Developers, that is people working to enhance Heron. Adding any kind of debugger for components like SubmitterMain, RuntimeManagerMain/SchedulerMain are aimed for this group. Am I to understand that this is ticket is mainly aimed at the second set?

billonahill commented 7 years ago

That is correct. This is primarily for heron developers, like us. :)

maosongfu commented 7 years ago

I am thinking from actual use scenarios:

In most cases, people don't want to re-build and re-deploy a topology just to pick up those debugging jvm options; it would be awesome if we can restart a particular heron-instance with new jvm options at runtime.

So how about allowing heron-shell to accept a request to restart a particular heron-instance with new jvm options at runtime. (It can bring security concerns though)

billonahill commented 7 years ago

@maosongfu I haven't yet had the need to restart a running instance to add debug params. Instead I typically want to restart my local topology with debug flag enabled. Restarting and instance of a running topology also means you can't debug what happens the first time it's launched as part of a newly submitted topology.

I suggest we support for sort of passed debug arg or env variable at submit time that can do this.

maosongfu commented 7 years ago

@billonahill sounds good.