apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.39k stars 1.26k forks source link

Tracing doesn't work in some multi-server setup #10399

Open walterddr opened 1 year ago

walterddr commented 1 year ago

Currently trace info is recorded on a thread-local context with requestID as its primary key.

  1. When 2 servers are launched on the same JVM such as quickstart runner, it can collide and cause issue
  2. When 2 brokers individually generates requestID long and send to the same server simultaneously it will also cause contention (although very rare)
  3. for multi-stage engine, the same could happen much more frequently b/c the same requestID with different stageID can be sent to the same server (addressed in #10390)

log sample

java.util.ConcurrentModificationException: null
    at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1043) ~[?:?]
    at java.util.ArrayList$Itr.next(ArrayList.java:997) ~[?:?]
    at org.apache.pinot.core.util.trace.TraceContext$Trace.toJson(TraceContext.java:91) ~[classes/:?]
    at org.apache.pinot.core.util.trace.TraceContext.getTraceInfo(TraceContext.java:195) ~[classes/:?]
    at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.executeInternal(ServerQueryExecutorV1Impl.java:284) ~[classes/:?]
    at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.execute(ServerQueryExecutorV1Impl.java:146) ~[classes/:?]
    at org.apache.pinot.core.query.executor.QueryExecutor.execute(QueryExecutor.java:100) ~[classes/:?]
    at org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:154) ~[classes/:?]
    at org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:136) ~[classes/:?]
    at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java) [?:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
walterddr commented 1 year ago

hitting this again on https://github.com/apache/pinot/pull/10711