Closed maximal closed 4 years ago
@maximal the seemingly infinite loop is expected and it will eventually settle (and you should wait until it settles) as mentioned in #323.
For the binding issue, I suspect you have some custom configuration or environment variables you have set, getting uwsgi confused about ports and protocols. Can you share your configuration and custom env variables?
How long might it take? A waited a couple of hours.
My .env
:
COMPOSE_PROJECT_NAME=sentry_onpremise
SENTRY_EVENT_RETENTION_DAYS=90
The installation is very basic except that I use nginx as a proxy domain → localhost:9000
.
Can confirm, ran a fresh install. And it's looping over the Kafka error, so far for 15 minutes in. @BYK how long is it supposed to take 😃?
Looks like Kafka is not able to get up at all. How much memory do you have on this system? Maybe we should increase our memory requirements? Or maybe you're starved of CPU?
I had the same issue as well. It looks like the cause is that previous versions named the containers sentry-onprem_XXX
while the latest names them sentry_onpremise_XXX
and therefore the script is unable to properly clean them up, leading in conflicts when it tries setting them up again.
I fixed this by manually stopping and removing all the sentry-onprem_*
containers and running the install.sh
script again.
@OrfeasZ - ah, that may happen if you are coming from Sentry 10 BETA. That said there's some code in ./install.sh
to compensate for that. Anyway, always a good idea to try this, yes.
You might also want to remove volumes. To cleanup the current installation I'll do:
docker-compose down --volumes
I don’t have any docker-onprem_*
container:
% sudo docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0a281dca8121 confluentinc/cp-kafka:5.1.2 "/etc/confluent/dock…" 28 minutes ago Up 16 seconds 9092/tcp sentry_onpremise_kafka_1
f32a35e86c98 confluentinc/cp-zookeeper:5.1.2 "/etc/confluent/dock…" 28 minutes ago Up 28 minutes 2181/tcp, 2888/tcp, 3888/tcp sentry_onpremise_zookeeper_1
a48871313f83 yandex/clickhouse-server:19.4 "/entrypoint.sh" 28 minutes ago Restarting (70) 51 seconds ago sentry_onpremise_clickhouse_1
5ffd76a882a5 redis:5.0-alpine "docker-entrypoint.s…" 28 minutes ago Up 28 minutes 6379/tcp sentry_onpremise_redis_1
3e43baf87305 baf2bac6909e "/bin/sh -c 'exec $S…" 2 weeks ago Up 2 weeks 9000/tcp sentry-onpremise_worker_1
c55be6f95e3a baf2bac6909e "/bin/sh -c 'exec $S…" 2 weeks ago Up 2 weeks 9000/tcp sentry-onpremise_cron_1
e65ba465a007 0a43f4d54413 "/entrypoint.sh '0 0…" 2 weeks ago Up 2 weeks 9000/tcp sentry-onpremise_sentry-cleanup_1
8687418db44d baf2bac6909e "/bin/sh -c 'exec $S…" 2 weeks ago Up 2 weeks 0.0.0.0:9000->9000/tcp sentry-onpremise_web_1
f33e2a551fd0 52ff2f70ce97 "/entrypoint.sh '55 …" 2 weeks ago Up 2 weeks 3021/tcp sentry-onpremise_symbolicator-cleanup_1
b7b8fa8e4569 f2d2da71650e "/bin/bash /docker-e…" 2 weeks ago Up 2 weeks 3021/tcp sentry-onpremise_symbolicator_1
a08e9db2d6d8 f5bb4800d474 "/entrypoint.sh '*/5…" 2 weeks ago Up 2 weeks 1218/tcp sentry-onpremise_snuba-cleanup_1
a2956b2b9972 271fe968ae9c "./docker_entrypoint…" 2 weeks ago Up 2 weeks 1218/tcp sentry-onpremise_snuba-api_1
9d4b2fb9b026 271fe968ae9c "./docker_entrypoint…" 2 weeks ago Up 46 hours 1218/tcp sentry-onpremise_snuba-replacer_1
63d707e3e0b4 271fe968ae9c "./docker_entrypoint…" 2 weeks ago Up 46 hours 1218/tcp sentry-onpremise_snuba-consumer_1
cc2157f23291 confluentinc/cp-kafka:5.1.2 "/etc/confluent/dock…" 2 weeks ago Up 2 weeks 9092/tcp sentry-onpremise_kafka_1
bad751f33e63 a49ff3e0d85f "docker-entrypoint.s…" 2 weeks ago Up 2 weeks 6379/tcp sentry-onpremise_redis_1
a0c5d03b8ac8 memcached:1.5-alpine "docker-entrypoint.s…" 2 weeks ago Up 2 weeks 11211/tcp sentry-onpremise_memcached_1
f146b77c59ed 66f224075d8f "docker-entrypoint.s…" 2 weeks ago Up 2 weeks 25/tcp sentry-onpremise_smtp_1
f4619775b261 confluentinc/cp-zookeeper:5.1.2 "/etc/confluent/dock…" 2 weeks ago Up 2 weeks 2181/tcp, 2888/tcp, 3888/tcp sentry-onpremise_zookeeper_1
4ccdde74b4e5 yandex/clickhouse-server:19.4 "/entrypoint.sh" 2 weeks ago Up 2 weeks 8123/tcp, 9000/tcp, 9009/tcp sentry-onpremise_clickhouse_1
Run sudo docker-compose down --volumes
, got:
Stopping sentry_onpremise_kafka_1 ... done
Stopping sentry_onpremise_zookeeper_1 ... done
Stopping sentry_onpremise_clickhouse_1 ... done
Stopping sentry_onpremise_redis_1 ... done
Removing sentry_onpremise_snuba-api_run_28a5c1c68564 ... done
Removing sentry_onpremise_kafka_1 ... done
Removing sentry_onpremise_zookeeper_1 ... done
Removing sentry_onpremise_clickhouse_1 ... done
Removing sentry_onpremise_redis_1 ... done
Removing network sentry_onpremise_default
Volume sentry-data is external, skipping
Volume sentry-postgres is external, skipping
Volume sentry-redis is external, skipping
Volume sentry-zookeeper is external, skipping
Volume sentry-kafka is external, skipping
Volume sentry-clickhouse is external, skipping
Volume sentry-symbolicator is external, skipping
Removing volume sentry_onpremise_sentry-secrets
Removing volume sentry_onpremise_sentry-smtp
Removing volume sentry_onpremise_sentry-zookeeper-log
Removing volume sentry_onpremise_sentry-kafka-log
Removing volume sentry_onpremise_sentry-smtp-log
Running sudo ./install.sh
after that is still falling into infinite loop of creating/connecting Kafka.
I have 8 GB of RAM and 4-core CPU.
Install logs: sentry_install_log-2020-01-10_14-29-38.txt
@maximal Sorry, there was a typo in my comment. I meant to write sentry-onprem
. Try deleting all your sentry-onpremise_*
containers (the ones with the dash, not the underscore) and possibly also your sentry-kafka
volume (if it exists) and then run the install.sh
script again.
Removed all sentry-onpremise_*
containers:
sudo docker container rm sentry-onpremise_worker_1 sentry-onpremise_cron_1 sentry-onpremise_sentry-cleanup_1 sentry-onpremise_web_1 sentry-onpremise_symbolicator-cleanup_1 sentry-onpremise_symbolicator_1 sentry-onpremise_snuba-cleanup_1 sentry-onpremise_snuba-api_1 sentry-onpremise_snuba-replacer_1 sentry-onpremise_snuba-consumer_1 sentry-onpremise_kafka_1 sentry-onpremise_redis_1 sentry-onpremise_memcached_1 sentry-onpremise_smtp_1 sentry-onpremise_zookeeper_1 sentry-onpremise_clickhouse_1 --force
Removed sentry-kafka
volume:
sudo docker volume rm sentry-kafka
Run sudo ./install.sh
. Finally, it worked excellent! sudo docker-compose up -d
and we’re done!
Done. Thank you all guys!
By the way, there was some different error at the end of install:
Creating sentry_onpremise_snuba-replacer_1 ... done
Creating sentry_onpremise_snuba-api_1 ... done
Creating sentry_onpremise_snuba-consumer_1 ... done
Creating sentry_onpremise_memcached_1 ... done
Creating sentry_onpremise_symbolicator_1 ... done
Creating sentry_onpremise_smtp_1 ... done
Creating sentry_onpremise_postgres_1 ... done
12:27:01 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
12:27:22 [INFO] sentry.plugins.github: apps-not-configured
!! Configuration error: sentry.search service failed to call validate()
'module' object has no attribute 'SnubaSearchBackend'
Cleaning up...
@maximal to fix that error go into your sentry/sentry.conf.py
file and edit the SENTRY_SEARCH
option to read as follows:
SENTRY_SEARCH = "sentry.search.snuba.EventsDatasetSnubaSearchBackend"
Then, re-run install.sh
.
@OrfeasZ, thanks! Set the variable, install.sh
yields:
Creating sentry_onpremise_snuba-replacer_1 ... done
Creating sentry_onpremise_snuba-consumer_1 ... done
Creating sentry_onpremise_snuba-api_1 ... done
12:40:17 [WARNING] sentry.utils.geo: settings.GEOIP_PATH_MMDB not configured.
12:40:37 [INFO] sentry.plugins.github: apps-not-configured
Traceback (most recent call last):
File "/usr/local/bin/sentry", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/sentry/runner/__init__.py", line 164, in main
cli(prog_name=get_prog(), obj={}, max_content_width=100)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/sentry/runner/decorators.py", line 30, in inner
return ctx.invoke(f, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/sentry/runner/commands/upgrade.py", line 168, in upgrade
_upgrade(not noinput, traceback, verbosity, not no_repair)
File "/usr/local/lib/python2.7/site-packages/sentry/runner/commands/upgrade.py", line 121, in _upgrade
_migrate_from_south(verbosity)
File "/usr/local/lib/python2.7/site-packages/sentry/runner/commands/upgrade.py", line 93, in _migrate_from_south
if not _has_south_history(connection):
File "/usr/local/lib/python2.7/site-packages/sentry/runner/commands/upgrade.py", line 78, in _has_south_history
cursor = connection.cursor()
File "/usr/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 233, in cursor
cursor = self.make_cursor(self._cursor())
File "/usr/local/lib/python2.7/site-packages/sentry/db/postgres/decorators.py", line 44, in inner
return func(self, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/sentry/db/postgres/base.py", line 99, in _cursor
cursor = super(DatabaseWrapper, self)._cursor()
File "/usr/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 204, in _cursor
self.ensure_connection()
File "/usr/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
self.connect()
File "/usr/local/lib/python2.7/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/usr/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
self.connect()
File "/usr/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 171, in connect
self.connection = self.get_new_connection(conn_params)
File "/usr/local/lib/python2.7/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection
connection = Database.connect(**conn_params)
File "/usr/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 126, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: FATAL: the database system is starting up
Cleaning up...
Is it fine?
@maximal That doesn't look right, but I'm not sure.
Anyway, the issue is closed. Thank you for your time!
@maximal that doesn't look right. Probably your postgres container is taking its time spinning up. I'd try running the script again and see what happens.
Seems like Postgres volume got corrupted during the endless attempts to fight this issue.
Logs were:
postgres_1 | PostgreSQL Database directory appears to contain a database; Skipping initialization
postgres_1 |
postgres_1 | LOG: database system was interrupted while in recovery at 2020-01-10 12:43:56 UTC
postgres_1 | HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
postgres_1 | LOG: database system was not properly shut down; automatic recovery in progress
postgres_1 | LOG: redo starts at 0/645760B8
postgres_1 | WARNING: specified item offset is too large
postgres_1 | CONTEXT: xlog redo at 0/645A98D8 for Btree/INSERT_LEAF: off 70
postgres_1 | PANIC: btree_insert_redo: failed to add item
postgres_1 | CONTEXT: xlog redo at 0/645A98D8 for Btree/INSERT_LEAF: off 70
postgres_1 | LOG: startup process (PID 25) was terminated by signal 6: Aborted
postgres_1 | LOG: aborting startup due to startup process failure
postgres_1 | LOG: database system is shut down
I made a clean fresh install already, unfortunately. But this Kafka issue easily solved by https://github.com/getsentry/onpremise/issues/331#issuecomment-573013630 and https://github.com/getsentry/onpremise/issues/331#issuecomment-573017908, thus closed.
Hi, I'm having the exact same problem,
+ exec gosu snuba snuba bootstrap --force
2020-05-05 18:49:59,366 Connection to Kafka failed (attempt 0)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
...
after snuba bootstrap, Kafka never wake up and the script timeout.
Having this issue while upgrading 10.0.0.dev0
to the new fresh dev version again.
Removing sentry-kafka
volume and sentry_onpremise_kafka_1
container won’t help:
docker-compose down
docker container rm sentry_onpremise_kafka_1
docker volume rm sentry-kafka
./install.sh
yields:
... ... ...
+ snuba bootstrap --help
+ set -- snuba bootstrap --force
+ set gosu snuba snuba bootstrap --force
+ exec gosu snuba snuba bootstrap --force
2020-05-05 23:19:38,870 Connection to Kafka failed (attempt 0)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
2020-05-05 23:19:40,873 Connection to Kafka failed (attempt 1)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
... ... ...
2020-05-05 23:21:35,085 Connection to Kafka failed (attempt 58)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
2020-05-05 23:21:37,088 Connection to Kafka failed (attempt 59)
Traceback (most recent call last):
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 11, in <module>
load_entry_point('snuba', 'console_scripts', 'snuba')()
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/bootstrap.py", line 56, in bootstrap
client.list_topics(timeout=1)
cimpl.KafkaException: KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
Cleaning up...
“Reverting” #465 helps indeed. In docker-compose.yml
:
Line 66: 'confluentinc/cp-zookeeper:5.5.0' → 'confluentinc/cp-zookeeper:5.1.2'
and
Line 80: 'confluentinc/cp-kafka:5.5.0' → 'confluentinc/cp-kafka:5.1.2'
After that, ./install.sh
runs just fine, doing migrations at the end and exiting normally.
“Reverting” #465 helps indeed. In
docker-compose.yml
:Line 66: 'confluentinc/cp-zookeeper:5.5.0' → 'confluentinc/cp-zookeeper:5.1.2' and Line 80: 'confluentinc/cp-kafka:5.5.0' → 'confluentinc/cp-kafka:5.1.2'
After that,
./install.sh
runs just fine, doing migrations at the end and exiting normally.
Your methods work, thanks!
cc @BYK , cause you did that commit - without automation test.
@kocoten1992
cc @BYK , cause you did that commit - without automation test.
It is not easy to test every possible case. We do have some basic tests to ensure at least Sentry is up and running but having upgrade tests would be a bit too costly. We may invest in that in the near future. In the meantime if you have the time to help, we'd appreciate any contributions to make these safer.
@kocoten1992, see https://github.com/getsentry/onpremise/issues/472#issuecomment-624719355
Since what is being experienced here is from a new reason, closing this issue in favor of #472. We can keep the discussion there.
hey, sorry for my attitude, I accidentally do docker prune all, and I get it on you, srr
@kocoten1992 no worries. My offer for any contributions if you have the time is still valid 🙂
I had a clean install of
10.0.0.dev0
. Pulled a repo, runsudo ./install.sh
.The result is I have an infinite loop of
Now
web
service doesn’t start, telling something like:Restarting Docker won’t help, nor killing the process listening to 9000.