apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.52k stars 1.29k forks source link

Fatal Error - Problematic frame libc.so.6+0x15c6ff - Server Crash #6493

Open opschronicle opened 3 years ago

opschronicle commented 3 years ago

I am facing an issue where both Pinot servers are not restarting and they are keep on crashing with following error. I am using latest image. Any one came across this?

 [Times: user=0.02 sys=0.00, real=0.00 secs]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f104649b6ff, pid=1, tid=0x00007ee665d06700
#
# JRE version: OpenJDK Runtime Environment (8.0_282-b08) (build 1.8.0_282-b08)
# Java VM: OpenJDK 64-Bit Server VM (25.282-b08 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x15c6ff]
#
# Core dump written. Default location: /opt/pinot/core or core.1
#
# An error report file with more information is saved as:
# /opt/pinot/hs_err_pid1.log
opschronicle commented 3 years ago
my-pinot-server-0 server  [Times: user=0.08 sys=0.00, real=0.02 secs]
my-pinot-controller-0 controller 2021/01/26 22:17:09.409 WARN [CallbackHandler] [ZkClient-EventThread-29-my-pinot-zk-cp-zookeeper.logging.svc.cluster.local:2181] Callback handler received event in wrong order. Listener: org.apache.helix.controller.GenericHelixController@362617cf, path: /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES, expected types: [INIT] but was CALLBACK
my-pinot-server-1 server 2021/01/26 22:17:09.588 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-18-my-pinot-zk-cp-zookeeper.logging.svc.cluster.local:2181] Fail to read record for paths: {/pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/7fdbc2b8-d50c-43e0-8aae-8f81149ff9f6=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/ebfe4780-7d65-42a0-9293-8de8be680b52=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/5cae61b1-c190-4ba4-8d75-2f55fbef3204=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/b3a4690d-8f0b-49f7-a950-6fd56746760c=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/0febb043-2ddb-44b6-8637-ac9ec8e991c4=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/08aa0bbe-ddbc-4a40-8e4b-a94faee38d4c=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/4c94c196-7ed7-46cc-9b48-a7d3928df6c5=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/f8d49a81-2834-4a12-9908-e8f52f6a8b9b=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/2eb95020-540c-46e7-b18b-56232d5079ce=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/ddc77c89-593b-4278-954f-6d0aad4bba4a=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/09a788d4-f94b-4172-a7b5-ed6df22523b7=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/b12423f6-22ba-46cb-a23a-84f20dbb7bb8=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/b8cb57cc-db6c-414a-a97e-b4fd2147ce14=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/c70e5415-1520-485d-9eb7-2d43940422a3=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/352ead98-9e5d-4572-b4a9-224c7dff6a3d=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/b0405375-5316-41ad-b134-ed55d2b54407=-101}
my-pinot-controller-0 controller 2021/01/26 22:17:09.600 WARN [CallbackHandler] [ZkClient-EventThread-29-my-pinot-zk-cp-zookeeper.logging.svc.cluster.local:2181] Callback handler received event in wrong order. Listener: org.apache.helix.controller.GenericHelixController@362617cf, path: /pinot-quickstart/INSTANCES/Server_my-pinot-server-1.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES, expected types: [INIT] but was CALLBACK
my-pinot-server-0 server 2021-01-26T22:17:09.358+0000: 113.899: Total time for which application threads were
opschronicle commented 3 years ago
my-pinot-controller-0 controller 2021/01/26 22:17:12.039 WARN [ZkBaseDataAccessor] [HelixController-pipeline-task-pinot-quickstart-(83ee1db3_TASK)] Fail to read record for paths: {/pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/51a431e9-c0e1-4e08-9bcb-aee747608526=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/f9005198-fe85-4cad-84e8-0df918de95d9=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/ef3e0a42-eee4-43a4-bb11-bd1878937e2c=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/2fd3aeea-91d8-446d-b87f-eeb3d99e24c9=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/03239839-70d8-49ed-84eb-6d9d4bc3b777=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/d7c99ffe-7a4d-433b-a265-cf8c9fba6a5f=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/d299feba-1781-407e-a7b3-83287fbd0997=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/68902832-2e61-4e6a-b249-e0e74711d493=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/084390ed-cccd-4171-b4c1-45c2f43ddc1a=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/53c3065c-60e3-4355-ab33-9f32221c5407=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/46f7ae44-ee3d-45c0-8781-6cedc67ea518=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/ba4e44c4-15d1-46a7-ac99-3f4fd11db97f=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/dd3e8c08-e914-4245-8857-3d1229573c35=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/95ad2084-572f-4651-b009-1f3eaab73e6b=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/cc5bea58-9618-4746-9a6c-a6b8e5afdfaa=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/3491cf67-e305-4767-9d55-0963f9ffdf00=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/c7dca37a-e4d6-4fbf-a244-ec5b92724c9e=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/b224f8f5-ea43-419a-a3af-11dace5e3f16=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/ae7c5ede-f7ec-45bb-a095-97210c6a7932=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/15a7fbcc-48b5-4300-9338-a0b009151e9e=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/f46c6152-858f-44f6-8658-f7c7540915f5=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/7a6a008a-33a2-4b08-9aac-db2fd0c00396=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/7819229e-21af-4760-9ce3-c38e446c1e14=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/5dcd8621-f658-4c49-be47-41aea30e9cb4=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/92519c36-bcfb-4501-8dbb-0702ebdfd708=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/5fa1b752-39e8-4563-847e-4eaef3c3d5b0=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/74263ff4-734d-4529-966c-f0cb0927593f=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/98bb3b99-80d8-40bb-bdb6-d34f0b5e1090=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/39114365-1976-42ab-882d-e729eb70c143=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/86c29e7a-c24d-46f1-9efb-865b02b8999a=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/bc02072d-68db-4264-acb6-e6ded758a74a=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/aec1490b-4a0f-4d66-aad7-f2801b172b65=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/d894e3b1-e597-484f-a748-96a470c93f93=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/1b1ec9f0-c17f-44f7-b6a1-26c0c141892f=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/8f0d8d0a-ec3a-4b11-bbf8-f44e0b611d3f=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/230e18fe-135b-4892-ac34-ae7909874224=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/3d64a6bb-5e6c-4f72-87eb-ac295cb7c0e3=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/4e0b1fb2-f514-4350-92ac-2162c408f947=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/b110753b-49e3-42d8-8a92-f635247a5d32=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/3891485f-621c-4a62-a702-4de7b9cda0ca=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/b221af1a-3704-4290-a980-4e6fac076d08=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/5b970ff8-8f17-4518-9ffc-7ff72a0bc52b=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/7401939f-83a1-433b-9a4b-801e8b110b49=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/16a027cf-5c09-4b5f-9030-8bc8f0afcded=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/1ef817b6-21cc-48ae-bd16-ed2ad2e80b6d=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/8e9f37bf-25e0-4086-8401-df0f73850a6b=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/88f453a1-eca9-4b2c-9e4f-87e88da6e4ed=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/d0d42bf9-3b12-4686-8c7d-cd7bde6284cc=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/4f2526dc-cdf6-4035-b1b6-bef9f8871583=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/2b020fac-6c46-4eb2-9802-1cb7bd7c586b=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/11b63a38-99d9-442c-9eca-56f7d427c9a1=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/372cec23-e89b-421d-b611-b9bd35ed6bb3=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/5a0aa3ae-d134-47e1-8fae-53aa0d70ee99=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/aec9d3ad-f1e9-4429-b031-1207534e9dff=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/fd9592c2-22e9-4536-a8ef-4d4e1eab6d5f=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/0c36d8f7-7905-48a7-befe-7f48693f4c17=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/0e31dca5-4234-4842-a5ac-b44ccd01d1cb=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/d998d2e5-1796-42be-8848-9154d0d0be2f=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/3e201c8c-f8cb-431c-9e10-ab7da5a03eef=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/7bb908de-8c97-4cc7-8499-440d2e7ce19c=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/06af6769-999b-4aa9-b1ec-4d3f063eb998=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/84eb9eb3-3ff4-4c47-9bb4-daa685ff7290=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/5f3a394a-0f9e-410b-ab9f-fd6989052588=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/42c83d12-35b8-4999-ace6-c51a9c15f5da=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/53e12a60-c891-4182-a8c6-a04cac8f7e9c=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/39ef8da1-1fd3-4de7-a150-1962ccda4194=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/39cb56e1-2dca-4c72-9bb6-ad98e83bd6fe=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/88580d08-3d66-4c48-8678-8d2879687ddc=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/5b1e6f1e-51d5-438d-a207-3af761fe960c=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/ad69603f-3ae7-43b0-9449-2f732c8acdfb=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/5514c26b-2d5e-47a3-839f-3c5b98303bc9=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/25c2728e-73eb-4ebe-b2d4-e61708348652=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/0b22e1da-632b-430c-be22-cc0c46ad0a75=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/dfd7e352-76e0-4a24-9294-84b72106d709=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/bea0dc31-8f14-4c54-8097-64f11b0a0f22=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/56805ab0-d620-401f-a6de-d2890709151d=-101}
my-pinot-server-0 server 2021-01-26T22:17:11.909+0000: 116.450: Total time for which application threads were stopped: 0.0134050 seconds, Stopping threads took: 0.0001936 seconds
kishoreg commented 3 years ago

what changed?

opschronicle commented 3 years ago

@kishoreg Nothing Changed it was running for 3 days without a hitch and suddenly keep crashing. I changed the memory from -Xms512M -Xmx4G to -Xms12G -Xmx12G and servers are up . Not sure what happened here..!! Also cannot get hold of coredmp or threaddmp as it is container. I may have to map /opt/pinot to some pvc.

opschronicle commented 3 years ago

Anyways all the segments are in corrupted status..!!! It seems I have to restore the server from backup again..!!!

opschronicle commented 3 years ago

I reloaded the segments and the server is back online . Is there any way to speed up the catch up process from the kafka stream?

kishoreg commented 3 years ago

add more servers and once they catch up, shrink the cluster and rebalance..

opschronicle commented 3 years ago

Thanks @kishoreg , The server did not recover. it went to bad state again and Pinot server crashed. I had to do the hard way delete and recreate. What are the possibilities of server crash in Pino and what steps normally we can take to avoid this?

kishoreg commented 3 years ago

any logs that you can share.

opschronicle commented 3 years ago

@kishoreg , it crashed again and Pino server stopped consuming messaging and query started to give incorrect results. The errors I can see are below


my-pinot-broker-0 broker 2021/01/28 14:15:07.151 ERROR [QueryRouter] [jersey-server-managed-async-executor-3] Caught exception while sending request 549 to server: my-pinot-server-0_R, marking query failed
my-pinot-broker-0 broker 2021/01/28 14:15:10.917 ERROR [QueryRouter] [jersey-server-managed-async-executor-3] Caught exception while sending request 550 to server: my-pinot-server-0_R, marking query failed
my-pinot-broker-0 broker 2021-01-28T14:15:10.918+0000: 64927.313: Total time for which application threads were stopped: 0.0000965 seconds, Stopping threads too2021/01/28 14:15:11.008 ERROR [QueryRouter] [jersey-server-managed-async-executor-3] Caught exception while sending request 551 to server: my-pinot-server-0_R, marking query failed
my-pinot-broker-0 broker 2021/01/28 14:15:12.104 ERROR [DataTableHandler] [nioEventLoopGroup-2-6] Caught exception while handling response from server: my-pinot-server-1_R
my-pinot-broker-0 broker 2021/01/28 14:15:12.106 ERROR [DataTableHandler] [nioEventLoopGroup-2-6] Channel for server: my-pinot-server-1_R is now inactive, marking server down
my-pinot-broker-0 broker 2021/01/28 14:20:30.912 ERROR [DataTableHandler] [nioEventLoopGroup-2-4] Channel for server: my-pinot-server-0_R is now inactive, marking server down
Stopping threads took: 0.0000264 seconds
2021-01-28T14:29:13.291+0000: 2021/01/28 14:29:13.301 WARN [SegmentColumnarIndexCreator] [Thread-7] Caught exception java.io.IOException: No space left on device while refreshing realtime lucene reader for segment: sblog__0__515__20210128T1354Z
sb-pinot-controller-0sb-pinot-server-0 server 21.588: Total time for which application threads were stopped: 0.0003119 seconds, Stopping threads took: 0.0001239 seconds
opschronicle commented 3 years ago

@kishoreg what could be the possible reasons the space is getting filled when I try to catch up messages from the stream? This will only happen when I recreate the table and try to catch up messages.

Jackie-Jiang commented 3 years ago

Storage will be filled up naturally as the servers consume more records, and flush them to the disk. Can you please check the disk space left in your data directory?

opschronicle commented 3 years ago

@Jackie-Jiang Thanks it was 100% . I deleted some data from stream . But not sure why a full stream catch up takes more disk space that normal realtime consumption. From my understanding data size for three days retention on kafka should remain same whether we consume in Pinot at one shot or consume in three long days.

Jackie-Jiang commented 3 years ago

While consuming, Pinot allocates some mmap'ed files as the off-heap memory buffer. If the consumption rate is very high, the file size could be larger because we will have more records in one segment. I would recommend giving more room for the storage, e.g. keep the usage under 50%

opschronicle commented 3 years ago

Thanks for the suggestion @Jackie-Jiang , I will allocate more space. Also wondering whether it is possible to set the segment time based on the time field from kafka? So that segments will expire based on kafka time rather than Pinot segment created time. My retention is 3 days , the issue happens when I have to catch up entire 3 days messages at once and all the caught up segment dates will have same date and will only expire after 3 days and they consume lot of disk space unnecessarily.

Jackie-Jiang commented 3 years ago

@pabrahamusa Pinot is using the time column (should be the time field in kafka as well) value to expire the segment, instead of the segment creation time. The segment will still be created, but should be removed once the latest time value is older than the retention.

opschronicle commented 3 years ago

@Jackie-Jiang Thanks I will verify this and confirm .

Conaire commented 2 months ago

The problem for me was not having enough space in my ebs volume.