Publishers continually retrying

sirmspencer commented 11 months ago

We are seeing really high memory when using this ring buffer. I'm not an expert in the class you are using so I have a question for you. It looks like clear creates a new ring buffer that is empty. How is clear disposing of the old buffer? This looks like its not.

  (clear [this]
    (RingBuffer. counter (empty buffer)))

BrunoBonacci commented 11 months ago

Could you please list which publishers you are using?

As mentioned in the comment for https://github.com/clj-commons/ring-buffer/issues/15 the GC action will release the memory if no references are found which still points to the old buffer.

I use µ/log in high-volume production systems that are running for months without being restarted, if mulog was generating even the slightest memory leak it would have emerged many times in the past. So I'm pretty confident about µ/log's core.

However there are a few things that might not be obvious:

1) each publisher has an internal ring-buffer which could be sized smaller if needed 2) do not send request/response payload via mulog, the body of the request might be large and compound very quickly, the actual body is mostly of little or no analytical value

If unsure about the size of some values check the estimated size with mm/measure

Let's try to narrow down so we can get to the bottom of the issue.

sirmspencer commented 11 months ago

You were right on with the request/response payloads. We were seeing a manual GC work so that's why it felt like the ring buffer wasn't collecting correctly.

I added response payloads recent, but size limited everything. One route was accidentally logging image responses. It was a proxy route, so it was a byte array body. These byte arrays get converted to string before being passed to the size limiter. I think it was something to do with that image as a string that was causing the to json conversion for mulog to fail and then that value and / or the ring buffer it was on wasn't collected automatically.

sirmspencer commented 11 months ago

My current overview of this is that errors need to be handled in the publisher so it can return the empty buffer back. I am able to replicate this locally I think. One of our publishers doesn't handle errors and I can see it keep retrying the same message. Also when using the built in zipkin publisher I am seeing the same thing. Unhandled json errors keeps retrying.

{:publisher-type :zipkin
    :publisher-id "4uEa5W-2L6M3z1mWzZ0P0e8Z9MFJwuXu"
    :exception #error {  :cause "JSON encoding error - Map keys must be strings"  :via  [{:type charred.CharredException    :message "JSON encoding error - Map keys must be strings"    :at [charred.JSONWriter writeMap "JSONWriter.java" 189]}]  :trace  [[charred.JSONWriter writeMap "JSONWriter.java" 189]   
    [charred.api$reify__55254 accept "api.clj" 607]   
    [charred.JSONWriter writeObject "JSONWriter.java" 159]   
    [charred.JSONWriter writeMap "JSONWriter.java" 203]   
    [charred.api$reify__55254 accept "api.clj" 607]   
    [charred.JSONWriter writeObject "JSONWriter.java" 159]   
    [charred.JSONWriter writeArray "JSONWriter.java" 170]   
    [charred.api$reify__55254 accept "api.clj" 605]   
    [charred.JSONWriter writeObject "JSONWriter.java" 159]   
    [charred.api$write_json_fn$fn__55266 invoke "api.clj" 677]   
    [com.brunobonacci.mulog.common.json$eval55288$with_output_to_str__55289$fn__55290 invoke "json.clj" 53]   
    [com.brunobonacci.mulog.common.json$eval55288$to_json__55295 invoke "json.clj" 66]   
    [com.brunobonacci.mulog.publishers.zipkin$post_records invokeStatic "zipkin.clj" 87]   
    [com.brunobonacci.mulog.publishers.zipkin$post_records invoke "zipkin.clj" 79]   
    [com.brunobonacci.mulog.publishers.zipkin.ZipkinPublisher publish "zipkin.clj" 164]   
    [com.brunobonacci.mulog.core$start_publisher_BANG_$publish_attempt__26584 invoke "core.clj" 194]   
    [clojure.core$binding_conveyor_fn$fn__5823 invoke "core.clj" 2050]   
    [clojure.lang.AFn applyToHelper "AFn.java" 154]   
    [clojure.lang.RestFn applyTo "RestFn.java" 132]   
    [clojure.lang.Agent$Action doRun "Agent.java" 114]   
    [clojure.lang.Agent$Action run "Agent.java" 163]   
    [java.util.concurrent.ThreadPoolExecutor runWorker "ThreadPoolExecutor.java" 1144]   
    [java.util.concurrent.ThreadPoolExecutor$Worker run "ThreadPoolExecutor.java" 642]   
    [java.lang.Thread run "Thread.java" 1583]]}}

BrunoBonacci / mulog

Publishers continually retrying #117