clojure-goes-fast / clj-async-profiler

Embedded high-precision Clojure profiler
http://clojure-goes-fast.com/kb/profiling/clj-async-profiler/
411 stars 17 forks source link

(NumberFormatException) at java.lang.NumberFormatException/forInputString #17

Closed SevereOverfl0w closed 2 years ago

SevereOverfl0w commented 4 years ago

Stacktrace pointed me to post-process-stacks:

NumberFormatException For input string: ""                                                                                                                                                                                                                  
        java.lang.NumberFormatException.forInputString (NumberFormatException.java:65)                                                                                                                                                                      
        java.lang.Long.parseLong (Long.java:601)                                                                                                                                                                                                            
        java.lang.Long.<init> (Long.java:965)                                                                                                                                                                                                               
        clj-async-profiler.post-processing/post-process-stacks/fn--89359 (NO_SOURCE_FILE:104)                                                                                                                                                               
        clj-async-profiler.post-processing/post-process-stacks (NO_SOURCE_FILE:98)                                                                                                                                                                          
        clj-async-profiler.post-processing/post-process-stacks (NO_SOURCE_FILE:92)                                                                                                                                                                          
        clj-async-profiler.core/generate-flamegraph (core.clj:265)                                                                                                                                                                                          
        clj-async-profiler.core/generate-flamegraph (core.clj:257)                                                                                                                                                                                          
        clj-async-profiler.core/stop (core.clj:321)                                                                                                                                                                                                         
        clj-async-profiler.core/stop (core.clj:290)                                                                                                                                                                                                          

The line was missing anything after the space?!

SevereOverfl0w commented 4 years ago

This seems related to something I require, but I don't have a clue why this would happen.

alexander-yakushev commented 4 years ago

Would your application profile be too sensitive to share? If I could look at the txt file that the profiler produces, that would be much easier.

If not, we'll have to debug it the hard way. Does this exception reproduce consistently?

SevereOverfl0w commented 4 years ago

I can share it privately. How can I get the text file?

It does reproduce consistently.

alexander-yakushev commented 4 years ago

You'll find it in /tmp/clj-async-profiler/results/ directory, with the name matching the current time. If you use serve-files web UI, you'll also find it there if you click "Show raw files".

You can send it to alex at bytopia org.

SevereOverfl0w commented 4 years ago

I was able to produce this for a non-sensitive frame like so:

(require '[clj-async-profiler.core :as prof])
(prof/profile (+ 1 1))

But within the context of the project. The resulting file was:

start_thread;/usr/lib/jvm/openjdk11/lib/server/libjvm.so;/usr/lib/jvm/openjdk11/lib/server/libjvm.so;/usr/lib/jvm/openjdk11/lib/server/libjvm.so;/usr/lib/jvm/openjdk11/lib/server/libjvm.so;/usr/lib/jvm/openjdk11/lib/server/libjvm.so;/usr/lib/jvm/openjdk11/lib/server/libjvm.so;__GI_geteuid;[unknown];[unknown] 
alexander-yakushev commented 4 years ago

What's different in this setup from the previous setups you used clj-async-profiler in (I assume it worked for you at least once before)? Is it on Linux? Bare OS or inside Docker? What is the value of perf_event_paranoid? Are JDK debug symbols installed?

If the whole content of the file is just this line, then the profiler apparently can't extract any info at all.

SevereOverfl0w commented 4 years ago

I start my integrant system, which requires a bunch of namespaces. I literally just do (go) and before doing so it works, afterwards it doesn't.

I'll see if I can filter it down at all. I was hoping for known issue I think.

alexander-yakushev commented 4 years ago

Not really, I haven't seen such a thing before. I did observe empty tracefiles if the paranoid mode was misconfigured, but not a case where there is a stackframe without the number. Just FYI, this text file normally consists of lines that look like this:

foo;bar;baz;qux 42

where 42 is the number of samples with this exact stacktrace.

Could you launch a clean repl on that system with just clj-async-profiler, and try doing something like?

(prof/profile (dotimes [_ 100000] (reduce + (range 2000)))
SevereOverfl0w commented 4 years ago

I've managed to minimize the problem, it's caused by use of https://github.com/Deraen/sass4clj/blob/master/src/sass4clj/api.clj

With this deps.edn:

{:deps
 {deraen/sass4clj {:mvn/version "0.5.1"}
  com.clojure-goes-fast/clj-async-profiler {:mvn/version "0.4.1"}}
 :aliases
 {:dev {:jvm-opts ["-Djdk.attach.allowAttachSelf"]}}}

and this script.clj:

(require '[clj-async-profiler.core :as prof])
(require 'sass4clj.api)

;; (prof/profile (+ 1 1))

(sass4clj.api/build {:source-paths ["foo.scss"]
                     :target-path "foo.css"})

(prof/profile (+ 1 1))

Running clj -A:dev -i script.clj will cause the NumberFormatException. If you uncomment the first profile, then the build will cause a free() error of some kind! Not sure what's going on here :confused:

alexander-yakushev commented 4 years ago

Hi, I've run your example multiple times in multiple configurations, and could not reproduce it. I even gave it some example scss to make sure sass4clj.api/build really does something. Tested on JDK 1.8 and 11. All the resulting profiler txt files contain the correct stackframes and counts.

will cause the NumberFormatException

Yeah, that's still because of the corrupted txt file.

Could you please use (prof/profile (dotimes [_ 100000] (reduce + (range 2000))) as a test? There is a chance that (prof/profile (+ 1 1)) just doesn't have enough time to collect anything and produces corrupted files.

If you uncomment the first profile, then the build will cause a free() error of some kind!

That's something completely new 😄 Could you elaborate please? Is there any sort of trace/coredump/anything?

At this stage, can I ask you to try it with JDK 1.8? Also, would you be able to install JDK debug symbols (a package like openjdk-11-dbg, depending on your distro and package manager)? Thank you!

SevereOverfl0w commented 4 years ago

Tried the reduce, same problem.

Also, would you be able to install JDK debug symbols (a package like openjdk-11-dbg, depending on your distro and package manager)? Thank you!

I've tried this, but I'm still getting the below warnings. I've installed for both jdk8/11. It creates files like these on my filesystem:

/usr/lib/debug/usr/lib/jvm/java-1.8-openjdk/bin/java
/usr/lib/debug/usr/lib/jvm/java-1.8-openjdk/demo/jvmti/compiledMethodLoad/lib/libcompiledMethodLoad.so
/usr/lib/debug/usr/lib/jvm/java-1.8-openjdk/demo/jvmti/hprof/lib/libhprof.so

I tried to poke at the code to see how it detected debug symbols, thinking maybe there was an obvious something I needed to configure/symlink/etc. but I couldn't make heads or tails of it.

At this stage, can I ask you to try it with JDK 1.8?

/tmp/profilerissue
❯ java -version                                                                             
openjdk version "1.8.0_272"
OpenJDK Runtime Environment (build 1.8.0_272-b02)
OpenJDK 64-Bit Server VM (build 25.272-b02, mixed mode)

/tmp/profilerissue
❯ clj -A:dev -i script.clj
WARNING: When invoking clojure.main, use -M
Compiling {sass}... 
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
WARNING: Install JVM debug symbols to improve profile accuracy
Started [cpu] profiling

Syntax error (NumberFormatException) compiling at (script.clj:9:1).
For input string: ""

Full report at:
/tmp/clojure-4979107264673753037.edn

That's something completely new smile Could you elaborate please? Is there any sort of trace/coredump/anything?

/tmp/profilerissue 6s
❯ clj -A:dev -i script.clj                    
WARNING: When invoking clojure.main, use -M
WARNING: Install JVM debug symbols to improve profile accuracy
Started [cpu] profiling

Compiling {sass}... 
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
free(): invalid pointer
rlwrap: warning: clojure crashed, killed by SIGABRT.
rlwrap itself has not crashed, but for transparency,
it will now kill itself (without dumping core) with the same signal

warnings can be silenced by the --no-warnings (-n) option
zsh: abort      clj -A:dev -i script.clj
alexander-yakushev commented 4 years ago

Could you please do clojure -A:dev -i script.clj instead of clj? Is the hs_err_somethingsomething file created in the current directory?

alexander-yakushev commented 2 years ago

@SevereOverfl0w Hi Dominic! I wonder if you gave up on clj-async-profiler after this issue, or maybe it went away on its own?

SevereOverfl0w commented 2 years ago

Hey. I'm no longer working on the project that had this problem so it went away 😅

alexander-yakushev commented 2 years ago

That's one way to solve problems 😆. Thanks for the answer!