elastic / ece-support-diagnostics

Support diagnostics utility for Elastic Cloud Enterprise (ECE)
Apache License 2.0
15 stars 18 forks source link

Can we please capture JVM heap size as part of the diagnostics? #23

Closed Leaf-Lin closed 4 years ago

Leaf-Lin commented 5 years ago

This can be achieved with the following:

> for i in $(echo "allocators zookeeper-servers proxies directors constructors admin-consoles"); do echo -n $i && curl -s -u admin:$PW https://$ECE_URL:12443/api/v0/regions/ece-region/container-sets/${i} | jq '.' | grep MEMORY; done

outputs:

allocators            "ALLOCATOR_MEMORY_OPTIONS=-Xms512M -Xmx512M",                                                                                              
zookeeper-servers            "ZOOKEEPER_MEMORY_OPTIONS=-Xms256M -Xmx256M",                                                                                       
proxies            "PROXY_MEMORY_OPTIONS= ",                                                                                                                     
directors            "DIRECTOR_MEMORY_OPTIONS= ",                                                                                                                
constructors            "CONSTRUCTOR_MEMORY_OPTIONS= ",                                                                                                          
admin-consoles            "ADMINCONSOLE_MEMORY_OPTIONS=-Xms512M -Xmx512M", 

It will help support to see if the customer has deployed their ECE with the minimum recommended size for each of the services suggested here: https://www.elastic.co/guide/en/cloud-enterprise/current/ece-prereqs-hardware.html

jpcarey commented 5 years ago

I looked into this briefly, these values are contained in the Env variable that gets passed to the docker container. There can be values in the Env that we do not want to collect, so I'll open up an issue with the Cloud team to discuss how we should whitelist data for either docker inspect or the container-sets API.

thank you for opening this!

geekpete commented 4 years ago

I had a look at this script again, tested in ECE labs, a 1.1.4 baseline lab and another blank ECE lab where I installed ECE 2.6.2:

Minor tweaks seemed needed to get it to run for me, added a check for the JQ dependency as well.

command -v jq >/dev/null 2>&1 || { echo >&2 "JQ is required but not installed.  Aborting."; exit 1; }

# Config
export ADMIN="root"
export ADMIN_PWD="some big ece password"
export ECE_URL="https://localhost:12443"

for i in $(echo "allocators zookeeper-servers proxies directors constructors admin-consoles"); do echo -n "$i: " && curl --insecure -s -u "$ADMIN:$ADMIN_PWD" "${ECE_URL}/api/v0/regions/ece-region/container-sets/${i}" | jq '.' | grep "MEMORY_OPTIONS"; done

For ECE 2.x, the admin user is user not root though obviously...

Outputs:

ECE 1.1.4 baseline:

$ ./export.sh 
allocators:             "ALLOCATOR_MEMORY_OPTIONS=-Xms4G -Xmx4G",
zookeeper-servers:             "ZOOKEEPER_MEMORY_OPTIONS=-Xms4G -Xmx4G",
proxies:             "PROXY_MEMORY_OPTIONS=-Xms8G -Xmx8G",
directors:             "DIRECTOR_MEMORY_OPTIONS=-Xms1G -Xmx1G",
constructors:             "CONSTRUCTOR_MEMORY_OPTIONS=-Xms4G -Xmx4G",
admin-consoles:             "ADMINCONSOLE_MEMORY_OPTIONS=-Xms4G -Xmx4G",

all the right/recommended memory values are set.

ECE 2.6.2 with quickstart install:

$ ./export.sh 
allocators:             "ALLOCATOR_MEMORY_OPTIONS=-Xms1024M -Xmx1024M",
zookeeper-servers:             "ZOOKEEPER_MEMORY_OPTIONS=-Xms4096M -Xmx4096M",
proxies:             "ROUTE_SERVER_MEMORY_OPTIONS=",
directors:             "DIRECTOR_MEMORY_OPTIONS=",
constructors:             "CONSTRUCTOR_MEMORY_OPTIONS=",
admin-consoles:             "ADMINCONSOLE_MEMORY_OPTIONS=-Xms4096M -Xmx4096M",

blanks indicate no memory setting was specified and defaults are in play? You can also see that some values are set correctly or at least larger than the defaults used to be, as a fix was added in a recent 2.x version to increase some of the defaults to avoid customer issues where quickstart was used to install a production environment.

Are we saying that unwanted data might be captured on this single grepped MEMORY_OPTIONS line? The golang diag seemed a better approach with no dependencies but alas it's no longer maintained.

Support is caught in a position to use the old golang beta diag that's no longer maintained to capture missing information that the official diag misses or using either both diags to get the big picture or inventing custom scripts until the official diag also caters to the missing information.

Personally, the memory setting and Zookeeper mntr are two I'd say are critical to catching low hanging fruit problems upfront when looking at a diagnostic.

That being said, depending on ECE version, the ps.txt captured does seem to capture at least some containers' memory settings, ECE 1.1 appears to show more than ECE 2.x:

ps axwww | grep "\-Xms"

and this is already in the official ECE diag, so might be at least a good smoke test to see if quickstart/default settings were used or not vs being able to see all settings of all containers.

Leaf-Lin commented 4 years ago

Ah, I missed this and agree that ps.txt output already contain this. FWIW, this can be found using to check against minimal required heap size .

grep Xms elastic/ps.txt | awk -F Xms '{print$NF}' | awk '{print"-Xms"$1,$2,$NF}'