diennea / carapaceproxy

A Distributed Java Reverse Proxy
Apache License 2.0
23 stars 9 forks source link

Flaky tests #439

Open NiccoMlt opened 1 year ago

NiccoMlt commented 1 year ago

Some flaky tests were found while implementing #437:

Report Passed Failed Skipped Time
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CacheContentLengthLimitTest.xml 1✅ 1❌   22s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CacheExpireTest.xml 2✅ 3❌   33s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.NotModifiedTest.xml   1❌   7s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.mapper.BasicStandardEndpointMapperTest.xml 6✅ 1❌   48s

See https://github.com/diennea/carapaceproxy/actions/runs/5005242440/jobs/8969440561

But for example here https://github.com/NiccoMlt/carapaceproxy/actions/runs/5005216296/jobs/8969398591 I see different tests failing:

Report Passed Failed Skipped Time
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CacheContentLengthLimitTest.xml 1✅ 1❌   22s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CacheExpireTest.xml 4✅ 1❌   32s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.CaffeineCacheImplTest.xml 2✅ 1❌   3s
carapace-server/target/surefire-reports/TEST-org.carapaceproxy.server.cache.NotModifiedTest.xml   1❌   7s
hamadodene commented 3 months ago

After some investigation, I discovered a peculiar issue in the tests, and I believe it could be the cause of the failures in several flaky tests we've been experiencing. I'll try to explain it as clearly as possible.

Given that the initial error was in testAlwaysServeStaticContent test:

AVVERTENZA: [aed4c0cc] Unknown channel option 'TCP_KEEPCOUNT' for channel '[id: 0xaed4c0cc]'
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: localhost/127.0.0.1:56379
Caused by: java.net.ConnectException: Connection refused: no further information
    at java.base/sun.nio.ch.Net.pollConnect(Native Method)
    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)

To try to resolve this issue, I attempted to set a fixed port for WireMock so that all tests would use the same port. In reality, the dynamics I will explain later are most likely the same thing that happened here.

When the test in testAlwaysServeStaticContent starts, I see from the logs that the listener has been configured as follows: INFO: configured backend foo localhost:56.385 enabled:true

However, from the error, we see that Carapace is trying to call a port we did not expect. Looking at the preceding test, I see that it configured the port that the test testAlwaysServeStaticContent is launching.

INFORMAZIONI: configured backend foo localhost:56.379 enabled:true

To better understand and resolve this issue, I tried to choose a free port and assign it to WireMock. After that, the issue described below occurred. But I would say it is the same problem.


public class BasicStandardEndpointMapperTest {

    static int port;

    static {
        try {
            // Get a free port
            ServerSocket s = new ServerSocket(0);
            port = s.getLocalPort();
            s.close();

        } catch (IOException e) {
            // No OPS
        }
    }
    @Rule
    public WireMockRule backend1 = new WireMockRule(port);

I identified the problem by simultaneously running all the tests in the BasicStandardEndpointMapperTest class and analyzing the two tests, testServeACMEChallengeToken and testAlwaysServeStaticContent, where the latter is failing.

As you can see from my screenshot, the testServeACMEChallengeToken is executed before the testAlwaysServeStaticContent. Quite often, the testAlwaysServeStaticContent fails with the following error:

image


feb 16, 2024 3:51:54 PM org.carapaceproxy.core.Listeners bootListener
INFORMAZIONI: started listener at HostPort[host=0.0.0.0, port=1425]: reactor.netty.transport.ServerTransport$InetDisposableBind@402f8592
feb 16, 2024 3:51:54 PM org.carapaceproxy.server.cache.CacheByteBufMemoryUsageMetric start
INFORMAZIONI: Starting cache ByteBufAllocator usage, period: 5 seconds
request /not-exists.html
action match default-testServeAcme
feb 16, 2024 3:51:54 PM org.carapaceproxy.core.RequestsLogger logRequest
GRAVE: Request [2024-02-16 15:51:54.430] [GET localhost:1425 /not-exists.html] [uid:, sid:, ip:127.0.0.1] server=127.0.0.1, act=PROXY, route=default-testServeAcme, backend=localhost:49201. time t=0ms b=-1708095114430ms, protocol=HTTP/1.1 not logged to access log because RequestsLogger is closed
feb 16, 2024 3:51:54 PM reactor.util.Loggers$Slf4JLogger warn
AVVERTENZA: [d555f853] Unknown channel option 'TCP_KEEPCOUNT' for channel '[id: 0xd555f853]'
feb 16, 2024 3:51:54 PM reactor.util.Loggers$Slf4JLogger warn
AVVERTENZA: [d555f853] Unknown channel option 'TCP_KEEPINTERVAL' for channel '[id: 0xd555f853]'
feb 16, 2024 3:51:54 PM reactor.util.Loggers$Slf4JLogger warn
AVVERTENZA: [d555f853] Unknown channel option 'TCP_KEEPIDLE' for channel '[id: 0xd555f853]'
feb 16, 2024 3:51:54 PM com.github.tomakehurst.wiremock.common.Slf4jNotifier error
GRAVE: 
                                               Request was not matched
                                               =======================

-----------------------------------------------------------------------------------------------------------------------
| Closest stub                                             | Request                                                  |
-----------------------------------------------------------------------------------------------------------------------
                                                           |
GET                                                        | GET
/seconda.html                                              | /not-exists.html                                    <<<<< URL does not match
                                                           |
                                                           |
-----------------------------------------------------------------------------------------------------------------------

feb 16, 2024 3:51:59 PM org.eclipse.jetty.server.AbstractConnector doStop

In theory, this should not be possible since the route configuration specifies that requests matching .not-exists. should be directed to an action serving a static page.


public void testAlwaysServeStaticContent() throws Exception {

        stubFor(get(urlEqualTo("/seconda.html"))
                .willReturn(aResponse()
                        .withStatus(200)
                        .withHeader("Content-Type", "text/html")
                        .withBody("it <b>works</b> !!")));

        try (HttpProxyServer server = new HttpProxyServer(null, tmpDir.newFolder())) {

            {
                Properties configuration = new Properties();
                configuration.put("backend.1.id", "foo");
                configuration.put("backend.1.host", "localhost");
                configuration.put("backend.1.port", String.valueOf(port));
                configuration.put("backend.1.enabled", "true");

                configuration.put("director.1.id", "*");
                configuration.put("director.1.backends", "*");
                configuration.put("director.1.enabled", "true");

                configuration.put("listener.1.host", "0.0.0.0");
                configuration.put("listener.1.port", "1425");
                configuration.put("listener.1.ssl", "false");
                configuration.put("listener.1.enabled", "true");

                configuration.put("route.10.id", "default");
                configuration.put("route.10.enabled", "true");
                configuration.put("route.10.match", "all");
                configuration.put("route.10.action", "proxy-all");

                configuration.put("action.1.id", "serve-static");
                configuration.put("action.1.enabled", "true");
                configuration.put("action.1.type", "static");
                configuration.put("action.1.file", CLASSPATH_RESOURCE + "/test-static-page.html");
                configuration.put("action.1.code", "200");

                configuration.put("route.1.id", "static-page");
                configuration.put("route.1.enabled", "true");
                configuration.put("route.1.match", "request.uri ~ \".*index.*\"");
                configuration.put("route.1.action", "serve-static");

                configuration.put("action.2.id", "static-not-exists"); // file not exists
                configuration.put("action.2.enabled", "true");
                configuration.put("action.2.type", "static");
                configuration.put("action.2.file", CLASSPATH_RESOURCE + "/not-exists.html");
                configuration.put("action.2.code", "200");

                configuration.put("route.2.id", "static-not-exists");
                configuration.put("route.2.enabled", "true");
                configuration.put("route.2.match", "request.uri ~ \".*not-exists.*\"");
                configuration.put("route.2.action", "static-not-exists");

                PropertiesConfigurationStore config = new PropertiesConfigurationStore(configuration);
                server.configureAtBoot(config);
            }

            server.start();

To understand the flow, I changed the names of the routes so that they were different between the two tests. From the logs, it is evident that the requests originating from the testAlwaysServeStaticContent method reach the server configured in the testServeACMEChallengeToken class.

In test testServeACMEChallengeToken i set:

  configuration.put("route.10.id", "default-testServeAcme");
  configuration.put("route.10.enabled", "true");
  configuration.put("route.10.match", "all");
  configuration.put("route.10.action", "proxy-all");

and i added some logs in ProxyRequestsManager (method processRequest):

   public Publisher<Void> processRequest(ProxyRequest request) {
        request.setStartTs(System.currentTimeMillis());
        request.setLastActivity(request.getStartTs());

        parent.getFilters().forEach(filter -> filter.apply(request));

        System.out.println("request " + request.getUri());
        MapResult action = parent.getMapper().map(request);
        System.out.println("action match " + action.getRouteId());
        if (action == null) {
            LOGGER.log(Level.INFO, "Mapper returned NULL action for {0}", this);
            action = MapResult.internalError(MapResult.NO_ROUTE);
        }

And when i run testAlwaysServeStaticContent the result is:

NFORMAZIONI: Starting cache ByteBufAllocator usage, period: 5 seconds
request /not-exists.html
action match default-testServeAcme
feb 16, 2024 3:51:54 PM org.carapaceproxy.core.RequestsLogger logRequest
GRAVE: Request [2024-02-16 15:51:54.430] [GET localhost:1425 /not-exists.html] [uid:, sid:, ip:127.0.0.1] server=127.0.0.1, act=PROXY, route=default-testServeAcme, backend=localhost:49201. time t=0ms b=-1708095114430ms, protocol=HTTP/1.1 not logged to access log because RequestsLogger is closed
feb 16, 2024 3:51:54 PM reactor.util.Loggers$Slf4JLogger warn

I cannot comprehend why this happens. Perhaps the HTTP server in testServeACMEChallengeToken has not closed properly when the testAlwaysServeStaticContent starts? However, theoretically, in that case, testAlwaysServeStaticContent should not initiate since the port is still occupied.

Or maybe there is another reason that I am not understanding?

hamadodene commented 3 months ago

For the issue encountered above, I found some information here https://github.com/reactor/reactor-netty/issues/495#issuecomment-464837492 and https://github.com/reactor/reactor-netty/commit/152476f7144e300191b051c5773265a6a5c2f63d.

It seems that when disposeNow() is called, only inactive connections are closed, so active ones can keep the channel open. To handle this, you can follow the workaround suggested in the comments. In essence, it's necessary to ensure that all tasks are completed.

I tried the fix in this PR https://github.com/diennea/carapaceproxy/pull/457, and it seems to be working.

hamadodene commented 3 months ago

@NiccoMlt For the issue of flaky cache tests, I found something interesting here https://github.com/ben-manes/caffeine/issues/568.

It seems to be our issue. However, I'm having difficulty reproducing it; it's very random.

NiccoMlt commented 3 months ago

Great find @hamadodene, we are getting closer to the solution!

hamadodene commented 3 months ago

@NiccoMlt I analyzed one of the flaky cache tests and found something interesting. The test I examined is: org.carapaceproxy.server.cache.CaffeineCacheImplTest --> testMaxSize()

Occasionally, the test fails with the following error:

java.lang.AssertionError:
Expected: is <ContentPayload{chunks_n=0, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}>
     but: was null
Expected: is <ContentPayload{chunks_n=0, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}>
Actual: null

I added some logs to understand the dynamics. The final code with logs is as follows:


 @Test
    public void testMaxSize() throws Exception {
        final int maxSize = 1000;
        initializeCache(maxSize);

        long sleepBetweenPuts = 100;

        List<CacheEntry> entries = new ArrayList<>();
        int c = 0;
        long totSize = 0;
        boolean ok = false;
        while (!ok) {
            CacheEntry e = genCacheEntry("res_" + (c++), 100, 0);
            System.out.println("Adding element " + e.key.uri + " of size=" + e.getMemUsage());
            entries.add(e);
            System.out.println("Payload to add  for key " + e.key  + " is: " + e.payload);
            cache.put(e.key, e.payload);
            System.out.println("payload at this point is " + e.payload);
            CachedContent result = cache.get(e.key);
            System.out.println("Result at this point is for " + e.key + " is_  "+ result);
            assertThat(result, is(e.payload));
            System.out.println(" > cache.memSize=" + cache.getMemSize());
            totSize += e.getMemUsage();
            // In the last insertion cache.getMemSize() will exceed the maximum
            if (totSize <= maxSize) {
                assertThat(cache.getMemSize(), equalTo(totSize));
            } else {
                ok = true;
            }
            runEviction(cache, null);
            Thread.sleep(sleepBetweenPuts);
        }

        // Now we should have put in our cache much elements as it can contain + 1. The first one should be evicted.
        runEviction(cache, 1);
        assertThat(evictedResources.size(), is(1));

        // Doesn't work. It currently seems that the eviction process always evitcts the last entered key, which is
        // undoubtedly a serious issue
//        for (int i=0; i<entries.size(); i++) {
//            if (i == 0) {
//                assertThat(evictedResources.get(0), equalTo(entries.get(i)));
//                assertThat(evictedResources.get(0).removalCause, equalTo(RemovalCause.SIZE));
//            } else {
//                assertThat(cache.get(entries.get(i).key), equalTo(entries.get(i).payload));
//            }
//        }
//        entries.remove(0);
        CacheEntry evictedEntry = evictedResources.get(0);
        assertTrue(entries.remove(evictedEntry));

        assertThat((int) cache.getMemSize(), equalTo(entries.stream().mapToInt(e -> (int) e.getMemUsage()).sum()));
        assertThat((int) cache.getSize(), is(entries.size()));

        assertThat((int) stats.getDirectMemoryUsed(), is(entries.stream().mapToInt(e -> (int) e.payload.directSize).sum()));
        assertThat((int) stats.getHeapMemoryUsed(), is(entries.stream().mapToInt(e -> (int) e.payload.heapSize).sum()));
        assertThat((int) stats.getTotalMemoryUsed(), is(entries.stream().mapToInt(e -> (int) e.getMemUsage()).sum()));
    }

The result of the test is as follows:

Adding element res_0 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_0} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:41.349, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.332, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_0 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:41.349, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.332, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:41.349, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.332, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_0} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:41.349, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.332, size=100 (heap=50, direct=50)}
 > cache.memSize=314
Adding element res_1 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_1} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:44.482, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.434, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_1 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:44.482, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.434, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:44.482, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.434, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_1} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:44.482, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.434, size=100 (heap=50, direct=50)}
 > cache.memSize=628
Adding element res_2 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_2} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:45.511, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.535, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_2 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:45.511, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.535, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:45.511, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.535, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_2} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:45.511, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.535, size=100 (heap=50, direct=50)}
 > cache.memSize=942
Adding element res_3 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_3} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_3 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}
Cache: FINE content res_3: removed due to max size exceeded
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_3} is_  null

java.lang.AssertionError: 
Expected: is <ContentPayload{chunks_n=0, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}>
     but: was null
Expected :is <ContentPayload{chunks_n=0, creationTs=2024-02-25 22:37:46.54, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.636, size=100 (heap=50, direct=50)}>
Actual   :null

As you can see, in the last iteration of the while loop, we have this log: Cache: FINE content res_3: removed due to max size exceeded In essence, the problem is that the content is removed from the cache before the assertThat is executed. This is because the maximum size of the cache has been exceeded.

The timing of the removal may vary. Therefore, depending on when the removal is done for size exceeded, the test may pass or fail.

When the test passes, we observe something like the following, where you can see that the value is removed from the cache after the assertThat has been executed.


Adding element res_0 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_0} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:38:33.007, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:58.848, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_0 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:38:33.007, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:58.848, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:38:33.007, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:58.848, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_0} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:38:33.007, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:58.848, size=100 (heap=50, direct=50)}
 > cache.memSize=314
Adding element res_1 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_1} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:34:59.296, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.013, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_1 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:34:59.296, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.013, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:34:59.296, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.013, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_1} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:34:59.296, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.013, size=100 (heap=50, direct=50)}
 > cache.memSize=628
Adding element res_2 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_2} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:00.604, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.115, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_2 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:00.604, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.115, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:00.604, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.115, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_2} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:00.604, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.115, size=100 (heap=50, direct=50)}
 > cache.memSize=942
Adding element res_3 of size=314
Payload to add  for key ContentKey{scheme=, method=, host=, uri=res_3} is: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:01.633, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.216, size=100 (heap=50, direct=50)}
Cache: FINE adding content res_3 payload: ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:01.633, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.216, size=100 (heap=50, direct=50)}
payload at this point is ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:01.633, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.216, size=100 (heap=50, direct=50)}
Result at this point is for ContentKey{scheme=, method=, host=, uri=res_3} is_  ContentPayload{chunks_n=2, creationTs=2024-02-25 22:35:01.633, lastModified=1970-01-01 01:00:00.0, expiresTs=2024-02-21 15:48:59.216, size=100 (heap=50, direct=50)}
 > cache.memSize=1256
Cache: FINE content res_3: removed due to max size exceeded
NiccoMlt commented 3 months ago

Again, great write-up!

Inspecting the provided block, it seems interesting the last part of the while loop:

CachedContent result = cache.get(e.key);
System.out.println("Result at this point is for " + e.key + " is_  "+ result);
assertThat(result, is(e.payload));
System.out.println(" > cache.memSize=" + cache.getMemSize());
totSize += e.getMemUsage();
// In the last insertion cache.getMemSize() will exceed the maximum
if (totSize <= maxSize) {
    assertThat(cache.getMemSize(), equalTo(totSize));
} else {
    ok = true;
}

You say that in the last loop iteration, when we reach max size, sometimes the cache is cleaned up and sometimes it doesn't, making the assertion assertThat(result, is(e.payload)) fail because there is actually no payload resulting from the get, because the cache was cleaned.

Couldn't we just move this assertion inside the if branch? Something like:

if (totSize <= maxSize) {
    assertThat(result, is(e.payload));
    assertThat(cache.getMemSize(), equalTo(totSize));
} ...
hamadodene commented 3 months ago

@NiccoMlt What you're saying makes sense. I will create a dedicated branch for fixing these flaky tests directly in the main repository so that you can contribute without having to use my fork.