bosun-monitor / bosun

Time Series Alerting Framework
http://bosun.org
MIT License
3.4k stars 495 forks source link

Install Python 3 in Docker image #2492

Closed muffix closed 3 years ago

muffix commented 3 years ago

Description

Updates the OpenTSDB Dockerfile to use python 3. Later versions of the alpine base image need an explicit version of Python specified. The previous python package without a version doesn't exist anymore.

Fixes #2491

Type of change

How has this been tested?

Checklist:

jmelosegui commented 3 years ago

Hi @muffix, I tested your changes in my local computer and I am getting a Connection refused to the OpenTSDB.

If I look at the logs for the opentsdb container, this is what I can see:

2020-09-23T12:10:50.481113900Z 2020-09-23 12:10:50,480 INFO Set uid to user 0 succeeded
2020-09-23T12:10:50.482775500Z 2020-09-23 12:10:50,482 INFO supervisord started with pid 1
2020-09-23T12:10:51.489117500Z 2020-09-23 12:10:51,488 INFO spawned: 'hbase' with pid 7
2020-09-23T12:10:51.490212300Z 2020-09-23 12:10:51,489 INFO spawned: 'opentsdb' with pid 8
2020-09-23T12:10:51.491080700Z 2020-09-23 12:10:51,490 INFO exited: hbase (exit status 127; not expected)
2020-09-23T12:10:51.492116100Z 2020-09-23 12:10:51,491 INFO exited: opentsdb (exit status 127; not expected)
2020-09-23T12:10:52.498672600Z 2020-09-23 12:10:52,497 INFO spawned: 'hbase' with pid 9
2020-09-23T12:10:52.505058600Z 2020-09-23 12:10:52,503 INFO spawned: 'opentsdb' with pid 10
2020-09-23T12:10:52.509655300Z 2020-09-23 12:10:52,509 INFO exited: hbase (exit status 127; not expected)
2020-09-23T12:10:52.513100900Z 2020-09-23 12:10:52,512 INFO exited: opentsdb (exit status 127; not expected)
2020-09-23T12:10:54.520886800Z 2020-09-23 12:10:54,519 INFO spawned: 'hbase' with pid 11
2020-09-23T12:10:54.526963000Z 2020-09-23 12:10:54,525 INFO spawned: 'opentsdb' with pid 12
2020-09-23T12:10:54.531328100Z 2020-09-23 12:10:54,530 INFO exited: hbase (exit status 127; not expected)
2020-09-23T12:10:54.534260200Z 2020-09-23 12:10:54,533 INFO exited: opentsdb (exit status 127; not expected)
2020-09-23T12:10:57.539153400Z 2020-09-23 12:10:57,538 INFO spawned: 'hbase' with pid 13
2020-09-23T12:10:57.540344500Z 2020-09-23 12:10:57,540 INFO spawned: 'opentsdb' with pid 14
2020-09-23T12:10:57.541482500Z 2020-09-23 12:10:57,541 INFO exited: hbase (exit status 127; not expected)
2020-09-23T12:10:57.542009200Z 2020-09-23 12:10:57,541 INFO gave up: hbase entered FATAL state, too many start retries too quickly
2020-09-23T12:10:57.542303000Z 2020-09-23 12:10:57,542 INFO exited: opentsdb (exit status 127; not expected)
2020-09-23T12:10:58.543643700Z 2020-09-23 12:10:58,543 INFO gave up: opentsdb entered FATAL state, too many start retries too quickly

Did you get this to work with your changes?

Thanks in advance.

muffix commented 3 years ago

Yes, @jmelosegui, works for me. I just double-checked with a clean Docker setup. Are you sure you've rebuilt the image? Just running docker-compose -f docker/docker-compose.yml up from the repo root directory did it for me.

Remember that OpenTSDB waits for 30 seconds for HBase to start up first (in the same container). You should be able to reach the HBase master at http://localhost:16010. And then 30 seconds later, OpenTSDB should respond at http://localhost:4242

jmelosegui commented 3 years ago

Yeah, not sure what could be wrong here.

These are the 3 containers running

❯ docker ps | Select -First 4
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                                                                        NAMES
ba6b2a39a074        docker_bosun           "sh -c '/usr/bin/sup…"   18 minutes ago      Up 18 minutes       5252/tcp, 9565/tcp, 0.0.0.0:8070->8070/tcp                                   bosun
47c4644511f5        docker_opentsdb        "sh -c '/usr/bin/sup…"   18 minutes ago      Up 18 minutes       0.0.0.0:4242->4242/tcp, 0.0.0.0:16010->16010/tcp, 0.0.0.0:16030->16030/tcp   opentsdb
4d189c455e29        redis:6                "docker-entrypoint.s…"   18 minutes ago      Up 18 minutes       0.0.0.0:6379->6379/tcp                                                       redis

and this is the log for the opentsdb container still failing after 18 minutes

❯ docker logs -t opentsdb
2020-09-23T16:29:36.186136400Z 2020-09-23 16:29:36,186 INFO Set uid to user 0 succeeded
2020-09-23T16:29:36.187694900Z 2020-09-23 16:29:36,187 INFO supervisord started with pid 1
2020-09-23T16:29:37.189767600Z 2020-09-23 16:29:37,189 INFO spawned: 'hbase' with pid 7
2020-09-23T16:29:37.190567900Z 2020-09-23 16:29:37,190 INFO spawned: 'opentsdb' with pid 8
2020-09-23T16:29:37.191637800Z 2020-09-23 16:29:37,191 INFO exited: hbase (exit status 127; not expected)
2020-09-23T16:29:37.192315400Z 2020-09-23 16:29:37,192 INFO exited: opentsdb (exit status 127; not expected)
2020-09-23T16:29:38.197743200Z 2020-09-23 16:29:38,196 INFO spawned: 'hbase' with pid 9
2020-09-23T16:29:38.199154200Z 2020-09-23 16:29:38,198 INFO spawned: 'opentsdb' with pid 10
2020-09-23T16:29:38.200215600Z 2020-09-23 16:29:38,200 INFO exited: hbase (exit status 127; not expected)
2020-09-23T16:29:38.200967100Z 2020-09-23 16:29:38,200 INFO exited: opentsdb (exit status 127; not expected)
2020-09-23T16:29:40.207303000Z 2020-09-23 16:29:40,206 INFO spawned: 'hbase' with pid 11
2020-09-23T16:29:40.210802400Z 2020-09-23 16:29:40,209 INFO spawned: 'opentsdb' with pid 12
2020-09-23T16:29:40.214541400Z 2020-09-23 16:29:40,214 INFO exited: hbase (exit status 127; not expected)
2020-09-23T16:29:40.216771900Z 2020-09-23 16:29:40,216 INFO exited: opentsdb (exit status 127; not expected)
2020-09-23T16:29:43.221018800Z 2020-09-23 16:29:43,220 INFO spawned: 'hbase' with pid 13
2020-09-23T16:29:43.221778700Z 2020-09-23 16:29:43,221 INFO spawned: 'opentsdb' with pid 14
2020-09-23T16:29:43.222878100Z 2020-09-23 16:29:43,222 INFO exited: hbase (exit status 127; not expected)
2020-09-23T16:29:43.223387800Z 2020-09-23 16:29:43,223 INFO gave up: hbase entered FATAL state, too many start retries too quickly
2020-09-23T16:29:43.223537400Z 2020-09-23 16:29:43,223 INFO exited: opentsdb (exit status 127; not expected)
2020-09-23T16:29:44.224843400Z 2020-09-23 16:29:44,224 INFO gave up: opentsdb entered FATAL state, too many start retries too quickly

I will continue investigating, but not sure where to start looking at.

muffix commented 3 years ago

Looks like the HBase process keeps dying and is then respawned by the supervisor. You can look into the HBase logs (or supervisor's) inside the container. This has nothing to do with the fix mentioned in this PR, though.

I just double-checked and reset my Docker installation and rebuilt the containers. Works as expected.

jmelosegui commented 3 years ago

Ok, thanks for your help @muffix!!!