apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.87k stars 3.38k forks source link

[Java][Flight] Flight SQL tests are flaky #41782

Open laurentgo opened 1 month ago

laurentgo commented 1 month ago

Describe the bug, including details regarding any error messages, version, and platform.

Several test failures in flight-sql module have been observed in multiple job executions:

The reported issue is

Error:  Errors: 
Error:    TestFlightSqlStreams.tearDown:224 » IllegalState Memory was leaked by query. Memory leaked: (250384)
Allocator(ROOT) 0/250384/250896/2147483647 (res/actual/peak/limit)

Note that there are also multiple messages about unclosed ManagedChannels in flight-core module:

May 22, 2024 5:50:41 AM io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference cleanQueue
SEVERE: *~*~*~ Previous channel ManagedChannelImpl{logId=505, target=directaddress:///localhost/127.0.0.1:5555} was garbage collected without being shut down! ~*~*~*
    Make sure to call shutdown()/shutdownNow()
java.lang.RuntimeException: ManagedChannel allocation site
    at io.grpc.internal@1.63.0/io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init>(ManagedChannelOrphanWrapper.java:102)
    at io.grpc.internal@1.63.0/io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:60)
    at io.grpc.internal@1.63.0/io.grpc.internal.ManagedChannelOrphanWrapper.<init>(ManagedChannelOrphanWrapper.java:51)
    at io.grpc.internal@1.63.0/io.grpc.internal.ManagedChannelImplBuilder.build(ManagedChannelImplBuilder.java:672)
    at io.grpc@1.63.0/io.grpc.ForwardingChannelBuilder2.build(ForwardingChannelBuilder2.java:260)
    at org.apache.arrow.flight.core/org.apache.arrow.flight.TestServerOptions.addHealthCheckService(TestServerOptions.java:191)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)

but those seems to only cause warnings, not errors

Component(s)

FlightRPC, Java

vibhatha commented 1 month ago

cc @lidavidm

laurentgo commented 1 month ago

I wonder if the issue is somehow similar to the one hinted at https://github.com/apache/arrow/blob/9ba9253e8527a7f3e2c6e47e631e278b8ca84e53/java/flight/flight-core/src/test/java/org/apache/arrow/flight/TestDoExchange.java#L407 where somehow data is not fully reclaimed? (in my local test for TestDoExchange it seems the leak is related to allocations triggered by Producer)

lidavidm commented 1 month ago

Can anyone reproduce it locally?

laurentgo commented 1 month ago

I cannot for TestFlightSqlStreams

lidavidm commented 1 month ago

I think we need to do two things:

@vibhatha

vibhatha commented 1 month ago

I will take a look. But need sometime, probably earliest next week.

vibhatha commented 1 month ago

We optionally track this info and it should be enabled in CI and the error message should be enhanced.

@lidavidm could I get a pointer in how to enable this in CIs?

lidavidm commented 1 month ago

You'll have to look. We neutered it after performance complaints.

laurentgo commented 1 month ago

Another occurence of a build failure: https://github.com/apache/arrow/actions/runs/9210096683/job/25336267356?pr=41800

vibhatha commented 1 month ago

Hard thing is replicating this locally. I tried a few times yesterday, no luck. Need to enable what @lidavidm mentioned in the CIs. I need to look into that.