Open vncoelho opened 6 years ago
Hi @vncoelho are both of containers crashing at the same time? It seems postgres becomes not available. Is it just at startup or can it happen after a long period of running well?
API container
Loading neoscan..
Starting dependencies..
Starting repos..
Running migrations for neoscan
2018-10-27 13:35:29.907 [info] [nonode@nohost] == Running Neoscan.Repo.Migrations.Counters.change/0 forward
2018-10-27 13:35:29.909 [info] [nonode@nohost] create table counters_cached
{"init terminating in do_boot",{#{'__exception__'=>true,'__struct__'=>'Elixir.Postgrex.Error',connection_id=>80,message=>nil,postgres=>#{code=>duplicate_table,file=><<"heap.c">>,line=><<"1067">>,message=><<"relation \"counters_cached\" already exists">>,pg_code=><<"42P07">>,routine=><<"heap_create_with_catalog">>,severity=><<"ERROR">>,unknown=><<"ERROR">>}},[{'Elixir.Ecto.Adapters.SQL','query!',5,[{file,"lib/ecto/adapters/sql.ex"},{line,200}]},{'Elixir.Ecto.Adapters.Postgres','-execute_ddl/3-fun-0-',4,[{file,"lib/ecto/adapters/postgres.ex"},{line,96}]},{'Elixir.Enum','-reduce/3-lists^foldl/2-0-',3,[{file,"lib/enum.ex"},{line,1925}]},{'Elixir.Ecto.Adapters.Postgres',execute_ddl,3,[{file,"lib/ecto/adapters/postgres.ex"},{line,96}]},{'Elixir.Ecto.Migration.Runner','-flush/0-fun-1-',2,[{file,"lib/ecto/migration/runner.ex"},{line,104}]},{'Elixir.Enum','-reduce/3-lists^foldl/2-0-',3,[{file,"lib/enum.ex"},{line,1925}]},{'Elixir.Ecto.Migration.Runner',flush,0,[{file,"lib/ecto/migration/runner.ex"},{line,102}]},{timer,tc,2,[{file,"timer.erl"},{line,181}]}]}}
init terminating in do_boot ({,[{Elixir.Ecto.Adapters.SQL,query!,5,[{_},{_}]},{Elixir.Ecto.Adapters.Postgres,-execute_ddl/3-fun-0-,4,[{_},{_}]},{Elixir.Enum,-reduce/3-lists^foldl/2-0-,3,[{_},{_}]},{El
Postgres:
/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
waiting for server to shut down....2018-10-27 13:35:21.180 UTC [43] LOG: received fast shutdown request
2018-10-27 13:35:21.182 UTC [43] LOG: aborting any active transactions
2018-10-27 13:35:21.185 UTC [43] LOG: worker process: logical replication launcher (PID 50) exited with exit code 1
2018-10-27 13:35:21.185 UTC [45] LOG: shutting down
2018-10-27 13:35:21.221 UTC [43] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2018-10-27 13:35:21.320 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2018-10-27 13:35:21.320 UTC [1] LOG: listening on IPv6 address "::", port 5432
2018-10-27 13:35:21.330 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-10-27 13:35:21.355 UTC [61] LOG: database system was shut down at 2018-10-27 13:35:21 UTC
2018-10-27 13:35:21.365 UTC [1] LOG: database system is ready to accept connections
2018-10-27 13:35:29.909 UTC [80] ERROR: relation "counters_cached" already exists
2018-10-27 13:35:29.909 UTC [80] STATEMENT: CREATE TABLE "counters_cached" ("name" varchar(255), "ref" bytea, "value" integer NOT NULL, PRIMARY KEY ("name", "ref"))
@vncoelho there is a known issue, if postgres is empty and both container start at the same time, both will try to run the migrations, it is possible it might put the db in a weird shape and be unusable. But this is for the starting case. It seems it is what shows your latest post.
For the initial error you posted, it seems the system was running fine, so I expect migration were done properly and suddenly db connection dropped?
I think that you are right, @adrienmo. In general, this error happens in the start-up of a Private Network, as you imagined.
Maybe we could insert a delay in the start up of Both Containers (api and sync) for until postgres has, at least, one block. What do you think?
@vncoelho I think best would be to make the migration safe for race condition, or maybe only make one container running the migration, I will check what are the options
Hey, @adrienmo
Thanks, no hurry on this anyway. It is just a minnor incident in these specific conditions. In addition, it seams that there is also a problem on Docker sometimes when it creates a lot of containers with docker-compose. I reported on docker-compose but they refused the problem and asked me to direct to the Docker repository...
Let's see how it flows.
Sometimes we are needing to use service docker reestart
in order to really get a connection between nodes.
We keep in touch,
Hi, @adrienmo,
Sometime, unexpectedly, the neo-scan api and sync dockers are crashing here in our experiments. I believe it is because we are making several requests and docker were not fully ready (in general it happens when no block was yet synced). What do you think? One of the logs is: