Open NanuIjaz opened 1 week ago
After you get the error, are the indices still being created? Try running this query:
select * from pg_stat_progress_create_index
Also, can you check if postgresql and cardano-node are still running at this point? Also if you could give more information about your environment, that would be helpful:
Sorry i should have given more details earlier.
I am pretty confused with the nature of issues we have .
we are runnning both db-sync and pg in docker.
i ran the select query you gave , it didnt return anything.
I notice the strange behaviour. Sometimes it gives the error after waiting at this point
[db-sync-node:Info:6] [2024-10-21 11:05:09.83 UTC] Found maintenance_work_mem=2GB, max_parallel_maintenance_workers=4 ExitFailure 2
Errors in file: /tmp/migrate-2024-10-21T11:05:09.835467022Z.log
sometimes it gives the error that i mentioned earlier.
After throwing this error, container restarts and starting syncing again. I can see its waiting here now
[db-sync-node:Info:81] [2024-10-21 12:41:50.32 UTC] Received block which is not in the db with HeaderFields {headerFieldSlot = SlotNo 137938707, headerFieldBlockNo = BlockNo 10990914, headerFieldHash = a544cd2f7bf24902ac5d9b0f674f67b02f46254b82fe8a6fafa58758f7956fba}. Time to restore consistency. [db-sync-node:Info:81] [2024-10-21 12:41:50.32 UTC] Starting at epoch 516
I think it will error out after this, i am watching it waits
This message:
Errors in file: /tmp/migrate-2024-10-21T11:05:09.835467022Z.log
Indicates there is a problem running a migration, which will cause db-sync to exit. Can you post the contents of that file?
I kept losing that file as container restarts. I am tailing the file right now, it doesnt show any messages yet.
just now it crashed like this
[db-sync-node:Info:81] [2024-10-21 12:41:50.32 UTC] Starting at epoch 516
[db-sync-node:Error:81] [2024-10-21 14:41:53.13 UTC] runDBThread: libpq: failed (no connection to the server ) [db-sync-node:Error:111] [2024-10-21 14:41:53.13 UTC] recvMsgRollForward: AsyncCancelled [db-sync-node:Error:106] [2024-10-21 14:41:53.13 UTC] ChainSyncWithBlocksPtcl: AsyncCancelled [db-sync-node.Subscription:Error:102] [2024-10-21 14:41:53.13 UTC] Identity Application Exception: LocalAddress "/home/cardano/ipc/node.socket" SubscriberError {seType = SubscriberWorkerCancelled, seMessage = "SubscriptionWorker exiting", seStack = []} cardano-db-sync: libpq: failed (no connection to the server )
this is from the logs ,
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
(1 row)
Running : migration-3-0001-20190816.sql Running : migration-3-0002-20200521.sql psql:/home/cardano/cardano-db-sync/schema/migration-3-0002-20200521.sql:4: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:/home/cardano/cardano-db-sync/schema/migration-3-0002-20200521.sql:4: error: connection to server was lost ExitFailure 2
Is it possible you're running out of memory? It seems clear from the logs that you're losing connection to the pg server
listen_addresses = '*' port = '5432' max_connections = '600' shared_buffers = '32GB' effective_cache_size = '96GB' maintenance_work_mem = '2GB' checkpoint_completion_target = '0.9' wal_buffers = '16MB' default_statistics_target = '100' random_page_cost = '1.0' effective_io_concurrency = '200' work_mem = '8GB' min_wal_size = '1GB' max_wal_size = '4GB' max_worker_processes = '128' max_parallel_workers_per_gather = '16' max_parallel_workers = '64' max_parallel_maintenance_workers = '4' log_min_duration_statement = '2000'
this is our postgres.conf file, I do see high memory consumption , but its not 100%. do you suggest any changes to above?
You might want to check out this tool: https://pgtune.leopard.in.ua/. This is what I used to generate my configuration. For my config, I chose "online transaction processing system"
here is the error, i was able to drill down till this.
2024-10-23 14:57:27.050 GMT [176] LOG: could not receive data from client: Connection reset by peer 2024-10-23 14:57:27.050 GMT [176] LOG: unexpected EOF on client connection with an open transaction
That error simply says a client connection was terminated.
You would need to look at your postgres DB crash reason (if needed , look at it outside of docker first), could be mariade of reasons [eg: Running out of infrastructure memory - for which can check oom msgs in system logs, ulimits, corrupted DB WAL markers if you haven't cleared existing DB before, etc].
IMO - github is not the right medium to help you troubleshoot system/infra issues. Maybe discord/forum/stackexchange would be better choices to search for existing or start new thread with better synopsis than what's presented here.
db-sync 13.5.0.2 , pg 14 , is stuck here and not moving. Eventually it fails and goes back to same stage. Higher work_mem is also set in pg.
[db-sync-node:Warning:81] [2024-10-18 11:25:58.77 UTC] Creating Indexes. This may require an extended period of time to perform. Setting a higher maintenance_work_mem from Postgres usually speeds up this process. These indexes are not used by db-sync but are meant for clients. If you want to skip some of these indexes, you can stop db-sync, delete or modify any migration-4-* files in the schema directory and restart it
error
[db-sync-node:Error:81] [2024-10-18 12:58:12.34 UTC] runDBThread: SqlError {sqlState = "", sqlExecStatus = FatalError, sqlErrorMsg = "", sqlErrorDetail = "", sqlErrorHint = ""}
please help on this.