daskos / mentor

Extensible Python Framework for Apache Mesos
Apache License 2.0
33 stars 6 forks source link

HTTP Mesos interface #49

Open Arttii opened 8 years ago

Arttii commented 8 years ago

Hi,

Not sure if this is the right place to ask this, but are there any thoughts on switching to the new http interface without protobuff? I was thinking about working on this myself, but not sure if there's any plans.

Feel free to close this if this is the wrong place to ask.

Thanks

kszucs commented 8 years ago

Hey Arttii,

This is the right place :) I'm planning to implement a http client, but don't have the time right now. I'd really love to see a pure python package without mesos.native.

There is a draft by mrocklin https://gist.github.com/mrocklin/72cfd17a9f097e7880730d66cbde16a0 implemented with tornado. I think this is the right approach to start with. BTW there is a promising library by douban https://github.com/douban/pymesos which could be an alternative to mesos.native, I haven't tried it yet.

Satyr doesn't have a development guide, but I can create one if You're insterested. First of all I should merge the unified branch...

Are You familiar with satyr?

Arttii commented 8 years ago

Yes i played with satyr a bit and looked over most of the code. Ive tried pymesos before, when it was still using the native interface. Seems to me pymesos is quite actively developed. Where you thinking more off re-implementing the client or integrating pymesos as a dependency?

It seems pymesos is quite compatible with the native implementation. Potentially we could just add it to the proxies module. Also it has functionality for detecting new master, which is quite nice.

There is also this https://github.com/massenz/zk-mesos/blob/develop/notebooks/Demo-API.ipynb

A dev guide would be nice :D

kszucs commented 8 years ago

According to .travis.yml pymesos even supports python2.7 python3.5 and pypy. It looks a reasonable decision to use pymesos as a dependency. Hopefully it will be the standard python client for mesos.

What about testing? Satyr currently uses drone with a fully functional mesos setup. I'll make an open hosted version of drone. Are You OK using drone or should we switch to travis?

Did You take a look at https://github.com/lensacom/satyr/pull/45 ? There are a lot of breaking changes. I'd like to create an interface for reusable executors. There is alredy support for unified containerizers too.

Arttii commented 8 years ago

Python 3.5 support is a big thing in my opinion. Drone is fine i guess. I have not used it before, but should not be a problem.

Ya i took a look, but wasn't sure what the status of it was, I guess we need to wait for merge before starting with the new feature, or do you think that branch is stable enough?

kszucs commented 8 years ago

That is stable enough. I'd like to use that instead of the master. If You want to test with drone You need to install drone:0.5.

Arttii commented 8 years ago

I think I have some time to work on this now, I'll look into integrating pymesos into everything.

The only thing that annoys me slightly is the dependency on zkpython vs kazoo.

Edit: Actually no it just defaults to zkpython if you have it, for performance I guess.

Arttii commented 8 years ago

What is your opinion on keeping the Message proxy shim or transitioning fully to dictionaries as pymesos?

kszucs commented 8 years ago

I've factored out the protobuf wrapper https://github.com/kszucs/proxo

After some digging I think we should implement a http client based on mrocklin's tornado gist instead of using pymesos. I intend to drop the old mesos interface and wrap the http message types directly http://mesos.apache.org/documentation/latest/scheduler-http-api/ .

This way we can keep the additional functionality e.g. comparing offer and task task <= offer

We can use pymesos as a crutch.

kszucs commented 8 years ago

I'm not sure in the interface though. How would You use satyr to create a new mesos framework? Would You implement with event handlers like this JS client or just compose/configure like Fenzo does?

Arttii commented 8 years ago

Well pymesos seems to be an almost 1 to 1 replacement of the native interface, just with dicts for everything instead of protobuff definitions. It has all the call backs handled I guess we could re implement the pymesos http interface in tornado, but no sure what the benefit of that would be. Apart from being async and sticking to internal implementations.

Before I found satyr i was thinking of going the JS way, but now it seems the benefit of Satyr would be to build on top of pymesos with a higher level API with additionally implemented default schedulers/ executors and the placement functionality. Like it is now basically. Or you were having other ideas?

Basically we could completely drop the proxy layer and replace it with pymesos.

Arttii commented 8 years ago

So do you think it's a better idea to keep all the wrapper in proxies.messages? It seems like a nice approach. I started doing that as well. It is quite easy going I might have a working solution quite soon it seems. With pymesos that is.

Also having looked at it again. The Fenzo approach is quite nice.

Arttii commented 7 years ago

So sadly i did not have much time to work on this in the last few weeks. Do you still think re-implementing with tornado is the way to go?

kszucs commented 7 years ago

Me neither :) do I can live with pymesos. Although we should inherit from pymesos' scheduler and executor instead of proxying them.

Arttii commented 7 years ago

Ya thats what i started doing. Annoying thing is that using async requests might be nicer long term, but pymesos has a bunch of stuff for detecting master failover and error handling. We can kinda "borrow" some of that, at least the network code would be much simpler with tornado or asyncio or something.

Also, I am definitely for, providing some kind of "fenzo" like placement strategy suggester, most of the functions are already there for bin packing and etc. API has to be considered though as you said.

Any thoughts about nicely wrapping the json dicts?

Arttii commented 7 years ago

So I got very distracted and started playing around with the networking bit based on the Tornado impl. , I made some prototypes in asyncio and normal requests, also mixing in rxpy and aioreactive. Nice thing about rxpy and reactive is the entire subscription based approach, so people could subscribe to the events they are interested and run some chained processing on them in a nice way. Question for aioreactive is loosing 2.7 compat, but I kinda like the API.

I guess this would be similar the mesosphere rxjava client. I just need to finish the retry logic and also add zookeeper master detection into the pipeline and most of the "client" code could be there.

kszucs commented 7 years ago

The same could be achieved with tornado and toolz while keeping compatibility with python 2.7. I think 2.7 support is a must have. We can adapt nice design patterns from distributed.

Arttii commented 7 years ago

Ya I agree, so I ended up prototyping more with RxPy and normal requests and also tornado async client. Somehow I forgot about toolz, even though I use it a bunch. Thing is, I always thought it was more for lazy evaluating chained processing pipelines as opposed to a general Stream/Observable concept, but toolz is defo much more pythonic. Could you point me at some examples from dask/distributed?

I have zookeeper leader election, connection retry, leader redirect discovery all working at the moment. So If a leader dies or anything it transparently resubscribes and continues on. It's quite nice. And also the client requests to mesos also working like a stream and non blocking

I can share some of the prototypes as well, they're just a bit all over the place code wise. I have some free time now, so I will play around with the toolz and tornado approach as well.

kszucs commented 7 years ago

Great-great news :)

https://github.com/dask/distributed/blob/master/distributed/client.py#L305 https://github.com/dask/distributed/blob/master/distributed/deploy/local.py

utilities https://github.com/dask/distributed/blob/master/distributed/utils.py

I'm curious about the changes!

Arttii commented 7 years ago

So I got to the point of wrapping the JSON calls. What do you think is a nice approach we could use, do we need schema validation for example to understand if the dicts are constructed in a proper way or just keep it flexible? If so, json schemas? Keep protobuff somehow?

Another question is to align with the current python interface approach, or the event style approach, more like javascript scheduler, I am not sure which one is more pythonic.

I will share the prototype soon. Its pure Tornado now done in a similar way to dask.

kszucs commented 7 years ago

How does mesos handle malformed requests? If the master validates we can keep it flexible.

The current interface uses callbacks like handlers in the event style approach too. We'll break the current interface no matter what.

I'm really curious!

Arttii commented 7 years ago

To give you an update, I have the scheduler and executors working with some problem on the executor side. Currently I just use dicts for the messages, but the api is very similar to satyr otherwise. Basically if we decide on a nice way to do the wrappers for the dict a-la proto we could even not break the api that drastically.

I have been busy with work the last few days so did not have time to share the prototype yet, but soon.

Arttii commented 7 years ago

So the actual prototype, still very messy. And there is some really weird issue in the executor, where the tornado AsyncHTTPClient used to send messages back to the executor constantly times out with 599 for no real reason. If i directly send the back the payload it works fine.

Edit. Using pycurl instread of the default HTTPClient solves the issue. I think there are thread safety concerns here around the place have to investigate to make it work properly all the time. First time writing tornado so maybe this is expected.

Edit. I fixed it i was just doing something stupid and no thread safe.

kszucs commented 7 years ago

Hey! Great! :)

I'll look into it in the next couple of days.

kszucs commented 7 years ago

How do You test the implementation?

Arttii commented 7 years ago

Running the scheduler.py will start a framework that accepts one offer and launches the executor in executor.py, which basically received a message, sends one and finishes.

Arttii commented 7 years ago

So I also used the Map class from the messages to create a wrapping shim around the json dicts and turn them into objects, but to be honest its creating a lot of headaches as magical behavior is kinda getting in the way.

Do you have any better ideas on how to do this?

The Satyr is api is mostly preserved so if we find a nice way to do this I guess we could port things over. Do you think maybe it warrants separating the core scheduling logic away from the more highlevel stuff?

Arttii commented 7 years ago

So I decided to separate the core driver logic away from satyr into the malefico package. Like this its a bit cleaner and we can experiment with using Tornado or other engine like asyncio, twisted or curio for the async stuff.

The actual satyr version interfacing with this is here https://github.com/Arttii/satyr/tree/malefico.

Its seems to work, but I cannot settle on a nice way to wrap the messages, with the current implementation there are problems with the Queue Executor as the cloudpickled functions do not reach the executor. Well they do, but due to the way I am doing the conversion now, the data field is always overwritten with an empty tuple (fn,args,kwargs). I think this is due the way i adapted the Map class to serve as the wrapper around Messages. It seems very unclean. I think imposing a strict schema and providing a to_dict method so that malefico can serialize to json would be the best course of action.

Some of the tests need reworking at the moment as well.

kszucs commented 7 years ago

Agree, I like the pluggable engine idea!

In the next week I'll take a look at malefico, maybe create some tests, and overview the message handling in Satyr. I've created an organizational issue #50, please take a look at it.

Arttii commented 7 years ago

So I am thinking the only nice way to wrap the messages is to use something like attrs and implement the individual message types and implicit conversion inside malefico or satyr to do as_dict or something. We are then free to add validation or whatever we want really. Something like this:

@attr.s                                            
class TaskID(object):
         value = attr.ib()
@attr.s                                            
class TaskInfo(object):
         task_id = attr.ib(convert=lambda b: TaskID(**b))
         y = attr.ib()

attr.asdict(c2)
{'task_id': {'value': 1}, 'y': 2}
kszucs commented 7 years ago

Excellent this is what i was looking for! Declerative way to define messages.