Separate java processes

netamego commented 2 years ago

Hi,

I have tried your amazing app. It's really good. I am very impressed with the ease of use and smooth operation. For java profiling the use of async-profiler is the best aproach could be taken.

The only functionality that I have missed is a separation in java proccess. I have servers with a lot of java process and in "Granulate Performance Studio" all java code goes under only one java proccess. My apps have no .war files to identify apps. Sometimes I need to know wich proccess is the responsible for CPU consumption.

Would be possible to filter by pid or identify app name in some way, for example with a java system propertie like -Dgprofilerapp=myapp?

Thanks a lot.

Jongy commented 2 years ago

Hi @netamego !

First of all, thanks for the feedback :)

We currently have 2 mechanisms in place that can help you separate applications which are merged by equal matching basename:

Separation by container - in the UI you can view the graph of a single container / a single k8s deployment / ...
appid: a concept we've introduced recently, appids are an extra frame inserted in the graph between the bottom frame (e.g: java) and the first Java frame. This frame is generated by finding a logical identifier for the running process: for Java applications, it is the filename of the JAR file (so you can see something like appid: java /path/to/my.jar). For Python, it can be the executed Python script / package, etc...

Those 2 methods are IMO preferred over filtering by PIDs / any other tag you attach to your application (such as -Dgprofiler...) because they "just work", and they work cluster-wide (if you had to manually add PIDs on multiple machines, that'd quickly become a nuisance). As you said, ease of use is one of the top values here haha.

Do these methods I suggested help? If not, could you please explain why and we'll try to think about other "automatic" separations that could work? I'm not against "manual" filtering, anyway - I just prefer to solve things the automated way. For your case I suppose that allowing to filter on the commandline of a process (so that you could add -Dgprofiler and then profile only those processes) would do? We could definitely add this as an optional flag.

netamego commented 2 years ago

Hi @Jongy,

I totally agree with you. The more automatic everything is, the better. The problem is that it can be difficult to automatically manage many java applications running on the same server and some of them with the same binary or .war name but different parameters. I think a system propertie (like -Dgprofilerapp=myapp) would be a bonus and would help a lot to organize large environments with many java applications. On the other hand, there are times when the name of a binary is meaningless to the administrator who knows the application by some kind of "proper" or "colloquial" name. Thank you very much for the great work you are doing and for answering so quickly.

Best regards

Jongy commented 2 years ago

1.

The problem is that it can be difficult to automatically manage many java applications running on the same server and some of them with the same binary or .war name but different parameters.

2.

On the other hand, there are times when the name of a binary is meaningless to the administrator who knows the application by some kind of "proper" or "colloquial" name

These 2 are good examples to where our current appid mechanism is of no use; and of course that containers are irrelevant if the environment is not containerized.

What about this idea - to create something more generic, we can instead base it on environment variables. You'll have a flag to gprofiler e.g --env-var-id=GPROFILER_APP; then gProfiler will, for each profiled PID, check if that environment variable GPROFILER_APP exists, and if it does, it'll insert a frame envid: <value of that variable>. This is preferable to Java system properties, which are only for Java and not relevant for Python / Golang etc. Would that be useful for your case?

netamego commented 2 years ago

First of all thank you very much for trying to understand the improvement that I propose.

I don't quite understand what you're saying. With an environment variable all the pids would go with the same application name, right?

I propose to be able to filter by java process. For example we have weblogic servers. A weblogic server has a java process (nodemanager) that starts all the others java apps. It would be interesting to be able to filter the flamegraph for a specific java process. I could give system propertie (like -Dgprofilerapp=myapp) to those processes. In the case of weblogic I don't see how each java process could have a specific name using an environment variable. Surely it is possible. I'll investigate a little more. We continue in contact. The same someone can contribute something more about this matter and discuss it.

Thanks a lot!!!

Best regards.

Jongy commented 2 years ago

First of all thank you very much for trying to understand the improvement that I propose.

Of course! We're taking feedback and improvement suggestions very seriously.

I don't quite understand what you're saying. With an environment variable all the pids would go with the same application name, right?

I propose to be able to filter by java process. For example we have weblogic servers. A weblogic server has a java process (nodemanager) that starts all the others java apps. It would be interesting to be able to filter the flamegraph for a specific java process. I could give system propertie (like -Dgprofilerapp=myapp) to those processes. In the case of weblogic I don't see how each java process could have a specific name using an environment variable. Surely it is possible. I'll investigate a little more. We continue in contact. The same someone can contribute something more about this matter and discuss it.

I'll elaborate. The issue with filtering by PIDs is that PIDs are volatile - they change once you re-run the app; they represent a single process on a single machine. If we filter, or group, processes by PIDs - we are unable to merge stacks from different invocations of the same app, or from the same app running on multiple machines.

My suggestion to use environment variables is actually very similar to your suggestion of Java system properties - the main difference is, that the concept of environment variables can be applied to other runtimes as well (e.g Python), which don't have the concept of system properties.

So, for example, while you propose that you start your app this way:

java ... Dgprofilerapp=myapp myjar.jar

I now suggest you do it that way:

export GPROFILER_APP=myapp
java ... myjar.jar

and then you tell gProfiler to look at the value of GPROFILER_APP per profiled application: gprofiler --env-var-id=GPROFILER_APP ....

Also, it can be applied to e.g Python just as well: export GPROFILER_APP=mypythonapp; python .....

Different applications can have e.g export GPROFILER_APP=myapp5555 and thus they will be displayed separately in the UI.

Was my explanation clear now? If not, I'd be happy to elaborate more.

netamego commented 2 years ago

Hi, Thanks so much for the explanation. Seems there are pros and cons in the use of environment variable or system property. I understand your point of view. I am focusing on a WebLogic server where one java app (nodemanager) starts the others. In that case is easy to configure a system property for each app. Seems it's possible to do as well with environment variable but doing odd stuff in startManagedWeblogic.sh

https://stackoverflow.com/questions/23620163/best-way-to-set-environmental-variables-in-weblogic-startup

Maybe, for java, both methods can be implemented. If -Dgprofilerapp=myapp is not found, failback to environment var GPROFILER_APP=mypythonapp

Best regards.

Jongy commented 2 years ago

I see. Yes, I suppose we could implement system properties as an additional feature alongside the environment variable based approach. I will add it to our backlog and we'll see how this feature meets other use cases.

Granulate / gprofiler

Separate java processes #274