AdamBien / lightfish

Payara / GlassFish Monitoring Utility
121 stars 41 forks source link

Improved multi instance support #11

Closed rveldpau closed 11 years ago

rveldpau commented 11 years ago

This pull request reflects 3 main changes

  1. Staggered Data Collection
  2. Parallel Data Collection fixes
  3. Fault Tolerances

    Staggered Data Collection

This pull request improves multiple instance support by staggering the requests rather than trying to pull all data at once. The following diagram shows how the new data collection works for multiple instances. data-collection-timeline

The diagram assumes the interval is set to two seconds, each time the interval is reached data collection for a new instance is started. Once a collection has been started for all server instances the system waits for all the data collection to complete. Once all the data collections are complete they are combined into an overall snapshot and sent on their merry way.

Parallel Data Collection fixes

When I switched from the Fork/Join to Asynchronous EJBs I introduced a few bugs/inefficiencies. The main two were the reprocessing of already processed data points and starting extra actions. The first bug caused the application to fail persisting any Snapshots because of a duplicate key for the Applications. The second used extra resources for no purpose because of some extra code left hanging around from refactoring. I also added the max threads per instance option to the advanced configuration page.

Fault Tolerances

When contacting the Glassfish server for statistics, every once in a while (more common with parallel data collection enabled) the Glassfish server will fail to respond properly. Instead of failing in these instances the system can now retry these failed calls, up to a maximum number of attempts. This option is configurable on the advanced configurations page.

rveldpau commented 11 years ago

Just as a note, on my deployed instance this changes the CPU usage from spiking to 300% (3 full cores) on my deployed instance, to steadily using between 30-50% while reducing the total amount of time taken from 15 seconds down to 10.

rveldpau commented 11 years ago

Hey Adam, is there a reason you haven't merged this? If you'd like any changes before you merge it I'd happily make them.

AdamBien commented 11 years ago

Consider it as merged. For some reason I cannot merge it from the web site. I will have to fire up a command line :-)

Thanks again for you great work. I'm waiting for the resolution of the GF bug: https://java.net/jira/browse/GLASSFISH-19677

AdamBien commented 11 years ago

Thanks!