AxonFramework / AxonFramework

Framework for Evolutionary Message-Driven Microservices on the JVM
https://axoniq.io/
Apache License 2.0
3.31k stars 788 forks source link

Revise gap logic of the `GapAwareTrackingToken` #1241

Open sgrimm opened 4 years ago

sgrimm commented 4 years ago

We had a bug in our code that caused us to publish thousands of events in a single unit of work. It ended up triggering some Axon behavior that brought our application to its knees.

If there is a very large gap in global index values, GapAwareTrackingToken ends up with a large list of missing IDs (as you'd expect). This causes huge amounts of memory to be allocated, and then the application grinds to a halt because it's spending all its time in full GC pauses that can't recover much memory.

I'm able to simulate this in a dev environment as follows:

  1. Make JdbcEventStorageEngine.appendEvents() sleep for a little while, say 10 seconds, after inserting the events (but still within the executeInTransaction() lambda) if the event list has more than a handful of events. You don't strictly need to do this but it'll make the reproduction of the problem more reliable.
  2. Start an application with a number of tracking event processors. You can see the bad behavior with just a single processor, but the problem gets more severe as the number of processors rises.
  3. In one thread, publish a small number of events at a shorter interval than the sleep delay in step 1.
  4. In another thread, start a UnitOfWork and publish a few thousand events inside it.

Test code I used for steps 3 and 4:

  public void spam() {
    BigDecimal obj = BigDecimal.ZERO;  // any non-primitive type will do

    Thread t = new Thread(() -> {
      for (int i = 0; i < 1000; i++) {
        eventBus.publish(GenericEventMessage.asEventMessage(obj));
        try {
          Thread.sleep(100);
        } catch (InterruptedException ignore) {}
      }
    });

    DefaultUnitOfWork.startAndGet(GenericEventMessage.asEventMessage(obj))
        .execute(
            () -> {
              for (int j = 0; j < 10000; j++) {
                eventBus.publish(GenericEventMessage.asEventMessage(obj));
              }

              t.start();
            });
  }

When JdbcEventStorageEngine.appendEvents() sleeps after inserting the big batch of events, and then the other thread publishes a new event of its own, the GapAwareTrackingToken for each tracking event processor will have a huge list of missing IDs. Then when the big batch gets committed, the application will start doing full GCs like crazy.

Running this under YourKit, I see that a huge percentage of the objects are ConcurrentSkipListMap$Node. Here's what I think is happening:

  1. The tracking event processor loads a batch of events from the table. As soon as the large UnitOfWork commits, there are suddenly a lot of new events all at once.
  2. For each event, it creates a TrackedDomainEventData that includes the current tracking token value.
  3. Each of those GapAwareTrackingTokens has its own copy of the gap list. So for a batch size of 100 and a gap of 10000, this means allocating gap lists with a total of roughly a million elements (a bit less than that since each copy of the list is one element shorter). These aren't GC-eligible because they are referenced by live objects.

This happens in parallel for each tracking event processor, and they all try to do it at the same time since they all get notified when the big batch is committed. ConcurrentSkipListSet has significant per-entry memory overhead and this causes a massive amount of memory to be allocated in a short amount of time.

While that's going on, there are also a lot of short-lived objects being created, which adds to the memory pressure:

  1. Serializing and deserializing the tracking tokens. The XStream serialization of the skiplist is pretty verbose and it ends up needing to create a ton of short-lived Strings every time it loads or stores a token.
  2. Shipping the tokens to and from the database. Thanks to the verbose serialization, lots of hundreds-of-kilobytes-large byte[] get instantiated for (and by) JDBC.
  3. Copying the gap list. In addition to the problem described above, the way immutability is implemented in GapAwareTrackingToken means gap lists are constantly being copied around.

Recovering from this situation turns out to be quite nasty: You can't just restart the application because it will read the tracking tokens again and try to load the events again and immediately end up back in the same situation. The only choice we came up with while this was causing an outage for us was to reset all the tracking tokens and manually republish all events that the token reset caused the application to skip over.

To be clear: Obviously an application committing tens of thousands of events in a single UnitOfWork is not good practice, and it was caused by an application bug in our case. But it shouldn't cause event processing to melt down.

sgrimm commented 4 years ago

A few possible fixes:

  1. Make GapAwareTrackingToken store ranges rather than individual IDs. Then a contiguous gap of any arbitrary size will take constant memory. Obviously the worst-case behavior here (a "gap" of every other ID over a range) is still bad but a small number of large gaps is probably a more common situation than a large number of small gaps.
  2. Store the gap list as long[] which will be very memory-efficient and also more compact when serialized.
  3. Use a persistent-collections library with memory-efficient copy-on-write semantics.
ghilainm commented 4 years ago

Exactly the same issue here...

Seems due to this wonderful query (but probably necessary) query

    select
        domaineven0_.global_index as col_0_0_,
        domaineven0_.type as col_1_0_,
        domaineven0_.aggregate_identifier as col_2_0_,
        domaineven0_.sequence_number as col_3_0_,
        domaineven0_.event_identifier as col_4_0_,
        domaineven0_.time_stamp as col_5_0_,
        domaineven0_.payload_type as col_6_0_,
        domaineven0_.payload_revision as col_7_0_,
        domaineven0_.payload as col_8_0_,
        domaineven0_.meta_data as col_9_0_ 
    from
        private.domain_event_entry domaineven0_ 
    where
        domaineven0_.global_index>? 
        or domaineven0_.global_index in (
            ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ? , ?
        ) 
    order by
        domaineven0_.global_index ASC limit ?

Any possible workaround in the mean time if we still need to use the Tracking event processors?

smcvb commented 4 years ago

A work around would be to ensure you have as little gaps as possible. Typically, you should not have to many gaps, ever, but sometimes a misconfiguration in the sequence generator used for the domain_event_entry causes these gaps.

The scenario @sgrimm describes is definitely problematic, but 9 out of 10 it's a reuse of the sequence generator for more then just the domain_event_entry table. I'd thus check this @ghilainm.

A quicker solution would be to use Axon Server as the event store, as it ensure you will never have gaps. Thus, you'd never need the query you've used shared.

ghilainm commented 4 years ago

@smcvb Thx for your quick answer. I was planning to use the JdbcEventStore to limit the side effect of hibernate on memory and i'll see if I can limit the number of gaps.

We already only had one sequence for the table. What is the recommended strategy for the sequence? Only allocate one at at time ? More? Wouldn't this be interesting to explain in the documentation (https://docs.axoniq.io/reference-guide/v/3.3/part-iv-advanced-tuning/performance-tuning#auto-increments-and-sequences)?

smcvb commented 4 years ago

A note stating that a dedicated sequence generator should be used for the globalIndex field of the domain_event_entry would indeed not hurt anybody.

As far as strategies go, the framework uses the plain @GeneratedValue annotation, which defaults to the GenerationType.AUTO strategy.

mbreevoort commented 4 years ago

We used the autoincrement with JdbcEventStore and got gaps of 2 ids every insert. With a memory usage of more than 900MB of EventStore objects. In a post Allard says they would timeout and disapear. After hours they where still in memory, is there a timeout?

The gaps in the id where the design of the Galera cluster with three MariaDB instances. After changing the config we got no gaps anymore in the sequence but a possibility of duplicate keys...

smcvb commented 4 years ago

@mbreevoort both the JpaEventStorageEngine and JdbcEventStorageEngine allow for configuring the "gap timeout" and "gap cleaning threshold". The former marks a gap "old" and removable, whilst the second marks the max number of gaps allowed overall until cleaning should start. By the way, this functionality was introduced in Axon 3.0.6.

smcvb commented 1 month ago

I have renamed this issue to "Revise gap logic of the GapAwareTrackingToken" to make it easier for us (AxonIQ team) to process the work for Axon Framework 5. Furthermore, I have added this issue to milestone "Release 5.0.0," as I want to investigate a simpler API for the GapAwareTrackingToken for AF5.

Lastly, it is important to note that there is a window for opportunity that this issue will be closed as a "won't fix," if we find a clean implementation following issue #2747.