Netflix-Skunkworks / gcviz

Garbage Collector Visualization Tool/Framework
Other
266 stars 43 forks source link

How to configure this for remote scraping? #2

Open shawnchasse opened 11 years ago

shawnchasse commented 11 years ago

I'd like to configure this to look at remote gclog files instead of those running locally, what do I need to do to get the remote configuration set up? It's not obvious at this point from glancing at the source files. I am also not very familiar at all with CGI so getting this setup, was somewhat difficult, but I managed to get it running appropriately as far as I can tell.

Any help would be appreciated.

mooreb commented 11 years ago

As it is currently factored, gcviz doesn't look at remote gc logs instead of those running locally.

When gcviz was initially written it supposed two modes: a remote log scraper and a local log scraper, but few at netflix used the remote log scraping facility so we simplified it.

It is still possible to do something like: scp the remote-data-collection folder to the host to examine run collect-remote-data.sh on the remote host pull back the generated report but this requires a good working knowledge of scp/ssh and for keys to be properly setup. And this runs the processing on the machine with the logs in question anyway.

What's the reason that you want to bring the logs to the visualizer instead of running the visualizer on the node with the logs? Understanding that might make it easier to help you. Please let me know.

Thanks much in advance and hoping this finds you well,

b

shawnchasse commented 11 years ago

Thanks for the quick reply. Too bad the remote scraping isn't in there already.

The main reason for me wanting to do remote scraping is the infrastructure that I am able to work with. Our services are run on Windows systems and use glassfish as the web service container. This would be harder to implement in that environment instead of a single linux machine that can get the logs off of each of the servers in question. Also I am trying to run this experimentally in production so installing it on the server isn't as desirable as it would be if it were ready for production profiling.

We also have a very distributed server infrastructure, for instance, one of the services I want to query is running across 9 servers and a different one is running across 10 servers. If other people wanted to utilize this tool that server count would balloon as we have about 100 different services at least.

I may be able to ship the logs back off of the desired systems with a simple on-system-application that runs on a daily basis for instance and then modify this script to read from an uploaded directory perhaps. Just an idea.

Thanks, Shawn

mooreb commented 11 years ago

Shawn,

I wondered if you were using windows. The visualization subsystem is written in python and could run on windows (it might not, yet, but I would consider it a bug if it didn't) but the data scraping subsystem is pretty heavy in shell and a little perl; in my experience that suggests that cygwin (or something similar) would be needed.

We don't use windows heavily at netflix, so that's why gcviz is tuned for use under linux.

I think your idea of shipping the logs to a centralized linux server for visualization/processing makes a lot of sense. It shouldn't be too hard to modify gcviz in your local environment to run on different local files. Let me know if you need/want help.

Hoping this finds you well,

b

shawnchasse commented 11 years ago

The only problem I'm really running into right now is the environment variable stuff and its reliance on the java pid and command line etc. Some things just aren't working due to that. Such as the line:

fullEnvDict = vmsgcvizutils.envFileAsDictionary(vmsGCReportDirectory + os.path.sep + 'env')

If you have any ideas as to the best way to proceed without this information that would be great.

mooreb commented 11 years ago

Hi Shawn,

The missing 'env' file is actually a pretty big problem. It suggests that the report is not being generated/collected correctly. I've made a patch to visualize-gc.py that should paper over this problem for you but there's something bigger going on here.

shawnchasse commented 11 years ago

How is the env file generated? I have one, but one entry is missing a value, only has a key, QUERY_STRING=

So parsing that blows up. I've gotten past that issue doing much of what you did, but I'm now working through what appears to be a formatting differential, that and since the scripts couldn't get the JVM information I might be seeing side effects of not having the boot time properly set.

mooreb commented 11 years ago

the env file is generated by the line in:

  collect_remote_data.sh

which reads:

env | LANG=C sort > ${OUTPUTDIR}/env

I have submitted a patch to make the env parsing a little more robust.

shawnchasse commented 11 years ago

Great, thanks for the patch. I was able to get png's generating yesterday so the changes I've made have been met with some success. It appears most of the issues I was having at the end were related to the time zone that was being output in the gc.log file from the machine I was testing on. I've made that configurable to a degree and will try to make that selectable or smarter. My next step is to modify the parsing process to take in parameters that you were ultimately determining automatically that I cannot since the process isn't local to the machine.

Once that is done, I want to be able to easily configure the directory structure that everything lives in so that it's easily modifiable for different environments.