apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.23k stars 3.47k forks source link

Out-of-heap memory leaks in FlightClient.getStream #27439

Open asfimport opened 3 years ago

asfimport commented 3 years ago

I'm trying to use Arrow and Flight in my Java project, I used FlightClient.getStream to request data from a remote node, but I found the allocator's allocated memory size grows with the request.

Java version: 8

Flight&Arrow version: 3.0.0

 

Here's a snippet of my code, I found the allocated memory of flithStream still exist after root.close:

try (FlightStream flightStream = client.getStream(ticket, CallOptions.timeout(1, TimeUnit.MILLISECONDS))) {     VectorSchemaRoot root = flightStream.getRoot();     root.clear();     while (flightStream.next()) {

        // transfer root to ArrowRecordBatch and execute         dispatchNext(shuffleChunk, root);         root.clear();     }     root.close(); } catch (FlightRuntimeException fe) {     String errorMsg = "IndexReadOperator read index fail ={}  "+indexChunkDesc.toString();     LOGGER.error(errorMsg, fe);     throw new NetworkException(errorMsg, fe); } catch (Exception e) {     LOGGER.error("IndexReadOperator read index fail!", e); throw new NetworkException("IndexReadOperator getStream fail! ", e); }

 

Is this a bug or am I using it incorrectly?

 

 

Environment: Java version: 8 Flight&Arrow version: 3.0.0 Reporter: lulijun

Note: This issue was originally created as ARROW-11569. Please see the migration documentation for further details.

asfimport commented 3 years ago

David Li / @lidavidm: [~lulijun]  Just to double-check, are you also closing the ArrowRecordBatch you generate? It's not clear from the snippet here + that's a common mistake.