douban / pymesos

A pure python implementation of Mesos scheduler and executor
BSD 3-Clause "New" or "Revised" License
163 stars 88 forks source link

MesosOperatorMasterDriver cannot subscribe to events #104

Closed irvinlim closed 6 years ago

irvinlim commented 6 years ago

Using the latest version of PyMesos 0.3.4, I cannot receive any events using the operator API. It looks like the SUBSCRIBE request is not sent, from my initial investigation.

On PyMesos 0.3.3, after calling driver.start(), I get a log message similar to the following:

Operator client subscribed with cluster state: ...

However, on 0.3.4, no such log is present, and any task updates or agent updates will not notify the operator.

I believe this can be reproduced with the basic examples provided. Falling back to PyMesos 0.3.3 fixes this issue, which seems like #98 introduced a regression.

irvinlim commented 6 years ago

I added some logging in process.py after line 300:

while True:
    logger.info('process=%s,conn=%s,master=%s', self, conn, self._master)
    if not conn and self._master:
        conn = Connection(self._master, self)
        logger.info('process=%s, conn=%s', self, conn)

Notice how the self._master variable is always None for MesosOperatorMasterDriver:

framework_1      | 2018-07-27 18:04:40.172|INFO|process=<pymesos.scheduler.MesosSchedulerDriver object at 0x7fbf17bea450>,conn=None,master=None
framework_1      | 2018-07-27 18:04:40.177|INFO|process=<pymesos.operator_v1.MesosOperatorMasterDriver object at 0x7fbf17bea650>,conn=None,master=None
framework_1      | 2018-07-27 18:04:40.183|INFO|process=<pymesos.scheduler.MesosSchedulerDriver object at 0x7fbf17bea450>,conn=None,master=mesos-master:5050
framework_1      | 2018-07-27 18:04:40.187|INFO|process=<pymesos.scheduler.MesosSchedulerDriver object at 0x7fbf17bea450>, conn=<pymesos.process.Connection object at 0x7fbf17beaf10>
mesos-master_1   | I0727 10:04:40.189290    15 http.cpp:1185] HTTP POST for /master/api/v1/scheduler from 172.26.0.5:49914
mesos-master_1   | I0727 10:04:40.189538    15 master.cpp:2610] Received subscription request for HTTP framework 'scheduler'
mesos-master_1   | I0727 10:04:40.189677    15 master.cpp:2745] Subscribing framework 'scheduler' with checkpointing disabled and capabilities [ TASK_KILLING_STATE, GPU_RESOURCES ]
mesos-master_1   | I0727 10:04:40.189718    15 master.cpp:7195] Updating framework 0bc93f09-207f-4528-9b20-d2f0527286c7-0000 (scheduler) with roles {  } suppressed
framework_1      | 2018-07-27 18:04:40.188|INFO|process=<pymesos.scheduler.MesosSchedulerDriver object at 0x7fbf17bea450>,conn=<pymesos.process.Connection object at 0x7fbf17beaf10>,master=mesos-master:5050
framework_1      | 2018-07-27 18:04:40.190|INFO|process=<pymesos.scheduler.MesosSchedulerDriver object at 0x7fbf17bea450>,conn=<pymesos.process.Connection object at 0x7fbf17beaf10>,master=mesos-master:5050
framework_1      | 2018-07-27 18:04:40.195|INFO|scheduler_re_registered|master={'hostname': 'mesos-master', 'port': 5050, 'version': u'1.5.0'}
framework_1      | 2018-07-27 18:04:40.195|INFO|process=<pymesos.scheduler.MesosSchedulerDriver object at 0x7fbf17bea450>,conn=<pymesos.process.Connection object at 0x7fbf17beaf10>,master=mesos-master:5050
ariesdevil commented 6 years ago

I reproduce this problem with operator_v1.py in examples. I notice that _send is failed only if try to connect mesos slave directly. This's what you mean?

irvinlim commented 6 years ago

See https://github.com/irvinlim/pymesos-0.3.4-bugrepro for what I mean. The operator is not able to subscribe to any events when using version 0.3.4.

Using the example code, I connected to the master directly with python operator_v1.py mesos-master:5050.

ariesdevil commented 6 years ago

Thanks very for your bug report repo.

irvinlim commented 6 years ago

Thank you for the fix! πŸ‘

Can I get you to publish a new version of the package when you can? Thank you!

ariesdevil commented 6 years ago

Done (https://github.com/douban/pymesos/releases/tag/0.3.5)

irvinlim commented 6 years ago

@ariesdevil ε€šθ°’δΊ†πŸ™