elastic / support-diagnostics

Support diagnostics utility for elasticsearch and logstash
Other
289 stars 154 forks source link

Support diagnostic utility not working on Oracle Linux 8.10 #766

Open surm7 opened 1 week ago

surm7 commented 1 week ago

A user is facing issues while capturing a diagnostic through when running it from an Oracle Linux 8.10 host. It works fine from Oracle Linux 7 host.

They error they get is:

2024-10-22T11:53:21.435120Z main WARN The Logger co.elastic.support.rest.ElasticRestClientInputs was created with the message factory org.apache.logging.log4j.message.ReusableMessageFactory@39655d3e and is now requested with a null message factory (defaults to org.apache.logging.log4j.message.ParameterizedMessageFactory), which may create log events with unexpected formatting.
Closing loggers.
Archiving diagnostic results.
Archive: -20241022-115321.zip was created
FATAL ERROR occurred: Couldn't create zip archive.. Check diagnostics.log in the archive file for more detail.
co.elastic.support.diagnostics.DiagnosticException: Couldn't create zip archive.
at co.elastic.support.util.ArchiveUtils.createZipArchive(ArchiveUtils.java:37) ~[diagnostics-9.2.0.jar:9.2.0]
at co.elastic.support.BaseService.createArchive(BaseService.java:71) ~[diagnostics-9.2.0.jar:9.2.0]
at co.elastic.support.diagnostics.DiagnosticService.exec(DiagnosticService.java:104) ~[diagnostics-9.2.0.jar:9.2.0]
at co.elastic.support.diagnostics.DiagnosticApp.main(DiagnosticApp.java:51) [diagnostics-9.2.0.jar:9.2.0]
Caused by: java.io.IOException: This archive contains unclosed entries.
at org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.finish(ZipArchiveOutputStream.java:928) ~[commons-compress-1.27.1.jar:1.27.1]
at org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.close(ZipArchiveOutputStream.java:571) ~[commons-compress-1.27.1.jar:1.27.1]
at co.elastic.support.util.ArchiveUtils.createZipArchive(ArchiveUtils.java:36) ~[diagnostics-9.2.0.jar:9.2.0]
... 3 more

JAVA_HOME is set. The script outputs a zip file that is corrupt and cannot be opened.

We understand that a guaranteed ETA cannot be promised regarding fixes or related discussions. However, we would like to submit this as a feature request for consideration.

pickypg commented 2 days ago

I have released Support Diagnostic v9.2.1, which has mostly just routine changes to dependencies (see the link for a list). The only one of potential interest is the upgrade to commons-io (which is unfortunately not as good as perhaps upgrading to a nonexistent, newer version of commons-compress given that we're on the latest version). You can see in the stacktrace that the error comes from usage of commons-compress-1.27.1.jar, which is the Apache library that we use for compression.

Searching across the internet, I do not see a bunch of these errors suddenly appearing (nor has it been reported here by anyone else), which makes me wonder if there's something weird about OEL 8.x. However, I do see a few old variations of the error in other software.

Would it be possible to review the diagnostics.log from the generated folder to see if an error occurred earlier? I am curious if an error was triggered earlier in the processing of the diagnostic that stopped a file from being closed, which could be triggering this as a secondary error.

Also, looking at the generated output, the name of the archive is suspicious:

Archive: -20241022-115321.zip was created

-20241022-115321.zip should have a prefix in front of it. Also, that line comes after we have completed filling up the archive, with the error happening when we try to [automatically] close the output stream.

https://github.com/elastic/support-diagnostics/blob/e7b22377a527fbf71dad65587a8d046848bcc236/src/main/java/co/elastic/support/util/ArchiveUtils.java#L27-L37

Line 34 is where we're logging the full filename, from line 27 (and line 37 is where we're throwing this exception after trying to close the output stream from line 32 automatically as we exit the try-with-resources block).

When running the wizard locally (with the just released, but unchanged in this regard, v9.2.1), using the default output directory, I get:

# Note: the pwd was /Users/pickypg/dev/elastic/support-diagnostics/target/diagnostics-9.2.1
Archive: /Users/pickypg/dev/elastic/support-diagnostics/target/diagnostics-9.2.1/api-diagnostics-20241111-232908.zip was created

Note that the prefix is the type of diagnostic selected and the full directory. In my case I left the default temp directory blank and proceeded. Perhaps something entered in the wizard for the directory is invalid for OEL 8.x?

pickypg commented 2 days ago

I have a pre-release of 9.2.2 put up to see if I could perhaps find a solution to this separate from the weird filename issue above (which could be related still).

https://github.com/elastic/support-diagnostics/releases/tag/v9.2.2-SNAPSHOT

If the user could give this a shot (and ignore it saying "the latest release is 9.2.1" point that will appear as they run it) to see if it works, that would help.