Closed richardstartin closed 2 years ago
another probably related: https://github.com/apache/pinot/runs/6380255258?check_suite_focus=true
the issue from https://github.com/apache/pinot/runs/6393195568?check_suite_focus=true seems to be a runner failure rather than related to GRPC.
...
2022-05-11T18:24:05.9718074Z 18:23:25.936 WARN [TimeBoundaryManager] [ClusterChangeHandlingThread] Failed to find segment with valid end time for table: mytable_OFFLINE, no time boundary generated
2022-05-11T18:24:05.9727643Z 18:23:41.210 WARN [TopStateHandoffReportStage] [HelixController-pipeline-default-GrpcBrokerClusterIntegrationTest-(f537ec4c_DEFAULT)] Event f537ec4c_DEFAULT : Cannot confirm top state missing start time. Use the current system time as the start time.
2022-05-11T18:24:05.9730929Z 18:23:49.201 WARN [TopStateHandoffReportStage] [HelixController-pipeline-default-GrpcBrokerClusterIntegrationTest-(65a70c02_DEFAULT)] Event 65a70c02_DEFAULT : Cannot confirm top state missing start time. Use the current system time as the start time.
2022-05-11T18:24:36.5164660Z [ERROR] Killed
<-------- [RR] Seem to be a transient runner failure???
2022-05-11T18:24:38.1386025Z [INFO] Running org.apache.pinot.integration.tests.access.CertBasedTlsChannelAccessControlFactory$CertBasedTlsChannelAccessControl$1
2022-05-11T18:24:38.5703576Z [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.356 s - in org.apache.pinot.integration.tests.access.CertBasedTlsChannelAccessControlFactory$CertBasedTlsChannelAccessControl$1
2022-05-11T18:24:39.3086423Z [INFO] Running org.apache.pinot.integration.tests.access.CertBasedTlsChannelAccessControlFactory$CertBasedTlsChannelAccessControl
...
regarding the second one
2022-05-11T01:28:31.3764680Z 01:28:20.705 ERROR [StreamingSelectionOnlyCombineOperator] [grpc-default-executor-0] Timed out while polling results block (query: QueryContext{_tableName='mytable_OFFLINE', _subquery=null, _selectExpressions=[*], _aliasList=[null], _filter=null, _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=1000000, _offset=0, _queryOptions={}, _debugOptions=null, _expressionOverrideHints={}, _explain=false})
looks like a query timeout. we should set a higher timeout value for this select * with 110K row plan stream back results on GHA servers.
Another failure on master branch: https://github.com/apache/pinot/runs/6517186511?check_suite_focus=true The JVM crashed during the test. We should try to figure out what has caused the crash
I think we need to have an RCA before closing these, lots of these issues have been reopened.
sorry I was confused regarding the detail of this issue. I thought we had a consensus based on no reply to my previous 2 comments
so I was only fixing the GRPCServer* test in https://github.com/apache/pinot/pull/8686. Let me take a look at the Broker one as well then. thx for reopening.
@richardstartin @Jackie-Jiang any idea how I can do a core/thread dump in github action ? i can use the same technique to stress test in 593a531ccc7a74cf33c626656229c56c693b23e1 for the broker but I am not sure how I can dump the state
https://github.com/apache/pinot/runs/6393195568?check_suite_focus=true