apache / arrow-java

Official Java implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
4 stars 5 forks source link

[Java] Force an outage of the client,server stuck in an endless loop #189

Open engimatic opened 1 year ago

engimatic commented 1 year ago

Describe the usage question you have. Please include as many useful details as possible.

try (BufferAllocator allocator = new RootAllocator()) {
            // Server
            try (final CookbookProducer producer = new CookbookProducer(allocator, location);
                 final FlightServer flightServer = FlightServer.builder(allocator, location, producer).build()) {
                try {
                    flightServer.start();
                    System.out.println("S1: Server (Location): Listening on port " + flightServer.getPort());
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
            }
        }

And I handle mysql resultset like this:

ResultSet resultSet = statement.executeQuery();
ArrowVectorIterator iterator = JdbcToArrow.sqlToArrowVectorIterator(
                         resultSet, config)
public void handleArrowIterator(ArrowVectorIterator iterator, BufferAllocator allocator) {
        int index = 0;

        while (iterator.hasNext() && !listener.isCancelled()) {
            if (listener.isReady()) {
                try (VectorSchemaRoot root = iterator.next()) {
                    index++;
                    VectorUnloader unloader = new VectorUnloader(root);

                    ArrowRecordBatch arb = unloader.getRecordBatch();

                    loader.load(arb);
                    listener.putNext();
                }

            }

        }
        listener.completed();
    }

But when I force an outage of the client,listener.isCancelled() and listener.isReady() always false,The server is stuck in an endless loop.How to reslove it? Server exception message:

io.grpc.netty.NettyServerHandler -Stream Error
io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
    at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:172)
    at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:481)
    at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
    at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:357)
    at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:1007)
    at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:963)
    at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:515)
    at io.netty.handler.codec.http2.DefaultHttp2Connection.close(DefaultHttp2Connection.java:153)
    at io.netty.handler.codec.http2.Http2ConnectionHandler$BaseDecoder.channelInactive(Http2ConnectionHandler.java:209)
    at io.netty.handler.codec.http2.Http2ConnectionHandler.channelInactive(Http2ConnectionHandler.java:417)
    at io.grpc.netty.NettyServerHandler.channelInactive(NettyServerHandler.java:628)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
    at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
    at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute$$$capture(AbstractEventExecutor.java:164)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
    at io.netty.channel.nio.NioEventLoop.run(NioE

Component(s)

FlightRPC

engimatic commented 1 year ago

And I setOnCancelHandler like this, but it didn't take effect.

listener.setOnCancelHandler(() -> {
                            listener.completed();
                            log.error("---getStream cancel");
                        });
lidavidm commented 1 year ago

@davisusanibar

davisusanibar commented 1 year ago

Hi @engimatic , I would appreciate it if you could clarify which parts are java code on the client side and which parts are part of the java server so that I could reproduce it in the best way possible. Thank you in advance.

engimatic commented 1 year ago

Hi @engimatic , I would appreciate it if you could clarify which parts are java code on the client side and which parts are part of the java server so that I could reproduce it in the best way possible. Thank you in advance.

@davisusanibar Server code:

import cn.hutool.core.thread.ThreadFactoryBuilder;
import io.netty.util.internal.PlatformDependent;
import lombok.extern.slf4j.Slf4j;
import org.apache.arrow.adapter.jdbc.ArrowVectorIterator;
import org.apache.arrow.adapter.jdbc.JdbcToArrow;
import org.apache.arrow.adapter.jdbc.JdbcToArrowConfig;
import org.apache.arrow.adapter.jdbc.JdbcToArrowConfigBuilder;
import org.apache.arrow.adapter.jdbc.JdbcToArrowUtils;
import org.apache.arrow.flight.FlightDescriptor;
import org.apache.arrow.flight.FlightServer;
import org.apache.arrow.flight.Location;
import org.apache.arrow.flight.NoOpFlightProducer;
import org.apache.arrow.flight.Ticket;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VectorLoader;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.VectorUnloader;
import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.sql.Connection;
import java.sql.Driver;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Properties;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

@Slf4j
public class Server {
    /**
     * main.
     *
     * @param args
     * @throws InterruptedException
     */
    public static void main(String[] args) throws InterruptedException, IOException {
        int corePoolSize = Math.max(Runtime.getRuntime().availableProcessors(), 2) * 2;
        int maxPoolSize = corePoolSize * 2;
        ExecutorService executorService = new ThreadPoolExecutor(corePoolSize, maxPoolSize, 300,
                TimeUnit.SECONDS, new ArrayBlockingQueue<>(100),
                ThreadFactoryBuilder.create().setNamePrefix("globalExecutor_pool_").build(),
                new ThreadPoolExecutor.CallerRunsPolicy());
        long maxSize = PlatformDependent.maxDirectMemory();
        long limit = maxSize - 1;
        Location location = Location.forGrpcInsecure("0.0.0.0", 8000);
        try (BufferAllocator allocator = new RootAllocator(limit)) {
            try (FlightServer flightServer = FlightServer.builder(allocator, location, new NoOpFlightProducer() {
                @Override
                public void getStream(CallContext context, Ticket ticket, ServerStreamListener listener) {
                    FlightDescriptor flightDescriptor = FlightDescriptor.path(
                            new String(ticket.getBytes(), StandardCharsets.UTF_8));
                    String sql = flightDescriptor.getPath().get(0);

                    Connection connection = null;
                    PreparedStatement statement = null;

                    JdbcToArrowConfig config = new JdbcToArrowConfigBuilder(allocator,
                            JdbcToArrowUtils.getUtcCalendar())
                            .build();

                    try {
                        Class<?> driverClass = Class.forName("com.mysql.jdbc.Driver");
                        Driver driver = (Driver) driverClass.newInstance();
                        String jdbcUrl = "jdbc:mysql://{ip}:{port}/test?"
                                + "useUnicode=true&socketTimeout=1800000&characterEncoding=UTF-8&autoReconnect=true&"
                                + "useSSL=false&zeroDateTimeBehavior=convertToNull&serverTimezone=Asia/Shanghai";
                        Properties info = new Properties();
                        info.put("user", user);
                        info.put("password", pass);
                        connection = driver.connect(jdbcUrl, info);
                        statement = connection.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY,
                                ResultSet.CONCUR_READ_ONLY);
                        statement.setFetchSize(Integer.MIN_VALUE);
                    } catch (ClassNotFoundException | SQLException | InstantiationException
                            | IllegalAccessException e) {
                        log.error(e.getMessage());
                    }

                    int index = 0;
                    VectorSchemaRoot vectorSchemaRoot = null;
                    try {
                        try (ResultSet resultSet = statement.executeQuery();
                             ArrowVectorIterator iterator = JdbcToArrow.sqlToArrowVectorIterator(
                                     resultSet, config)) {

                            while (iterator.hasNext() && !listener.isCancelled()) {
                                if (listener.isReady()) {
                                    try (VectorSchemaRoot root = iterator.next()) {
                                       index++;
                                        if (vectorSchemaRoot == null) {
                                            vectorSchemaRoot = root;
                                            listener.start(vectorSchemaRoot);
                                        }
                                        VectorLoader loader = new VectorLoader(vectorSchemaRoot);
                                        VectorUnloader unloader = new VectorUnloader(root);

                                        ArrowRecordBatch arb = unloader.getRecordBatch();

                                        loader.load(arb);
                                        listener.putNext();
                                        arb.close();
                                    }
                                    log.info("currentThreadName: {}, index: {}, "
                                                    + "allocator used {}, max {}, direct buffer userd {}",
                                            Thread.currentThread().getName(), index, allocator.getAllocatedMemory(),
                                            allocator.getLimit(), PlatformDependent.usedDirectMemory());
                                }
                            }
                        }
                    } catch (SQLException | IOException e) {
                        log.error(e.getMessage());
                    } finally {
                        listener.completed();
                        if (vectorSchemaRoot != null) {
                            vectorSchemaRoot.close();
                        }
                    }
                }
            }).executor(executorService).build()) {
                flightServer.start();
                log.info("ArrowFlightApp: Server (Location): Listening on port {}, max buffer size {}",
                        flightServer.getPort(), allocator.getLimit());
                flightServer.awaitTermination();
            }
        }
    }
}

client code:

import numpy as np
import pyarrow.flight as pf
import pyarrow as pa
import time
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed

client = pf.FlightClient("grpc://{ip}:8000")

def query(sql):
    ticket = pf.Ticket(str(sql).encode('utf-8'))
    start_time = time.time()
    reader = client.do_get(ticket)
    result = pd.DataFrame()
    for chunk in reader:
        chunk_df = pd.DataFrame()
        for num in range(chunk.data.num_columns):
            if type(chunk.data.column(num - 1)) == pa.Decimal128Array or type(
                    chunk.data.column(num - 1)) == pa.Decimal256Array:
                tmp_df = chunk.data.column(num - 1).to_pandas().astype(np.float64).to_frame()
            else:
                tmp_df = chunk.data.column(num - 1).to_pandas().to_frame()
        chunk_df = pd.concat([chunk_df, tmp_df], axis=1)
        result = pd.concat([result, chunk_df], ignore_index=True)

    print('convert data use time is : {}'.format(time.time() - start_time))
    return len(result)

with ThreadPoolExecutor(max_workers=20) as t:
    obj_list = []
    sql = '''SELECT * FROM test limit 100000'''
    for i in range(100):
        obj = t.submit(query, sql)
        obj_list.append(obj)

    j = 0
    for future in as_completed(obj_list):
        j = j + 1
        data = future.result()
        print(j, "Got rows total", data)
    client.close()

maven :

<dependency>
            <groupId>org.apache.arrow</groupId>
            <artifactId>flight-core</artifactId>
            <version>10.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.arrow</groupId>
            <artifactId>arrow-jdbc</artifactId>
            <version>10.0.0</version>
        </dependency>
davisusanibar commented 1 year ago

Hi @engimatic, I can run both your client and server code.

Could you share your steps for testing the endless loop with me? I cannot reproduce the endless loop.

engimatic commented 1 year ago

@davisusanibar

When your client run in a little while,forced kill it. Then you will find that the server stuck in an endless loop,at code:

while (iterator.hasNext() && !listener.isCancelled()) {
                                if (listener.isReady()) {

And the CPU of server is consistently running at high levels,the next request never returns.

engimatic commented 1 year ago

@davisusanibar Hello, is there any question about that?Or this is a bug with arrow flight.

davisusanibar commented 1 year ago

Hi @engimatic on my last attempt I caught it.

  1. Run Java Flight Server: Flight Server
  2. Run Python Client: python clientuser.py
  3. Run another Python Client: python clientuser.py
  4. Kill Python client: pkill -9 -f clientuser.py
  5. Endless loop appear

Let me try to debug to understand the error with more detail.

davisusanibar commented 1 year ago

In my trace log request, I'm seeing that the server responds with TCP Zero Window Segment, which is equal to Don't send me any more data, as I cannot handle them anyway. There may be a need to tune the ready/cancel parameter on this type of scenario in order to avoid endless loops. I'll investigate this further.

image
davisusanibar commented 1 year ago

There is a need to review how Java Flight Server handles window size for:

image
engimatic commented 1 year ago

@davisusanibar Maybe I should wait for the bug to be fixed.Is there any fast solutions?