Closed lydia-duncan closed 6 years ago
@awallace-cray - once this is merged, can you update the repository description as well? I don't have the permissions for that.
The build is already known to fail (and we are deprecating anyways), so cutting it off
It is a pity that PyChapel is being killed off, PyD works very well, but Go has taken the same view using multiple processes with a channel communication (very UNIX). Could a link to the reason thread be provided?
A priori, I see no reason for using Cython in a 0MQ solution, at least on the Chapel side, perhaps there is a pointer to a thread about this as well.
The biggest rationale for PyChapel in my view was "computation in Chapel, visualisation with Jupyter or Python/matplotlib". The goal can be achieved equally well with a multi-process, 0MQ mediated communication approach as a direct linkage approach. So whilst losing PyChapel is bad (mostly because of all my previous adverts at PyConUK :-) the way forward is a known successful architecture not least because Go does it. However I will have to do a PyConUK session in September 2018 on the new way of interworking.
You're right, I should provide more justification.
Basically what it boiled down to was that I was the only one on the team with a set up where it had provably worked before, but it had been at least half a year since the last time I built it. So when testing broke (see this mail and also the Travis run output on this PR, since the failure was entirely due to Chapel code updates), I realized that my setup needed to be adjusted/recreated in order to develop a fix and doing so felt like a lot of work when I was already pushing towards a different solution with (hopefully) less build limitations.
In terms of the Cython justification, that's kinda spread across a bunch of issues in the Chapel repository (and possibly other places) so I'll just summarize here. I was looking into the various strategies Python uses for interoperating with C to look for common syntax choices, useful strategies, etc. Cython seemed like a nice set up and since Chapel compiles down to C (currently), it seemed like it would provide a way to get up and running quickly (i.e. I wouldn't have to build in Python translations to C, or worry as much about portability on different platforms or across Python versions) and would let me focus on the Chapel side of things. And with the right set of mystical incantations/files, you can call any Chapel function PyChapel supported using Cython today.
ZMQ is useful in that it allows us to leave our multi-locale handling on the Chapel side without having to adjust it for getting spawned from Python.
My hope is to make the use of Cython and ZMQ opaque to the user, by generating much of the boilerplate and compiling the library into Cython library form so that the user doesn't have to deal with them too much. And using Cython should allow me to get to array support pretty quickly, since I don't have to worry as much about the basic features (because they are pretty much already handled)
I do wonder if the way of interworking between Chapel and Python in this repository was overambitious, and/or too tied to the Chapel implementation to be treated as a separate project.
With Go there has been an attempt to create a Go binding to the C API of Python 2, https://github.com/sbinet/go-python, but I think this is not the right way to go per se. Not only is it too low-level, it is not Python 3. The model of using separate processes and passing message is clearly a good way of working as long as the types can be serialized. Otherwise you end only working with hardware types, cf. MPI. Or maybe this is a good thing?
PyD and Boost.Python provide in-language mechanisms for dealing with the C linkage of symbols between shared objects/DLLs to connect D and C++ respectively to Python and the other way. The easy bit is calling D/C++ from Python since this is dealt with by creating a Python extension. I am sure that the PyD system shows how this can be done with Chapel, as long as Chapel symbols can be forced to C linkage. Calling in to Python from D and C++ is also covered by PyD and C++ respectively but requires a bit of trampolining to ensure correct initialisation.
Whilst I have a few examples of Boost.Python, I focused mainly on PyD. My experience is entirely on a single (usually multi-processor) machine, and the Chapel hardware base opens up a new question: where does the Python code reside. If the normal Chapel/Python architecture is for the Chapel to be on one computer and the Python to be on another then distributed systems apply and a 0MQ or MPI communication solution would be best, better than the Python extension method, which would effectively be a non starter. The multi-computer solution clearly works on a single computer, though 0MQ and MPI would not be as good a solution as UNIX sockets.
If Chapel already has a 0MQ binding, then the obvious way forward would seem to be just to use it, and say Chapel and Python interact only by message passing of data. In the modern, microservice obsesses world, this architecture Just Works™ and can easily be justified. The hard bit will be determining what data structures to support. Perhaps just go with what can be supported using JSON as a "transport layer".
This of course leaves things open to easy Chapel → (Ruby|R|Julia|Rust|Go|D|Python|Java|Kotlin|Groovy) which is probably a good thing. Ditching the idea of Chapel → Python as a special thing can easily be justified if the result is an architecture where Chapel can connect to many other languages in a microservice (aka distributed system) style approach.
If I say more just now, I shall ramble.
Heh, yeah. We do already have a ZMQ binding (and it accepts plain integers, which Python seems to want converted to strings before sending), so I've got a prototype which uses that already up and running as well.
As you point out, serializing types will be an interesting problem, but I think not insurmountable with a little effort once we allow the exporting of classes/records in Chapel (which is not currently supported in --library compilation, sadly). I'm focusing on arrays at the moment, but classes are being kept in mind (though I might handle the support file generation first so that things are easier for everyone to use).
Python will almost certainly want to generate byte sequences of all values for transmission so as to avoid problems of UTF-8 (whilst bringing the problem of UTF-8) and of endianism, not to mention float/double formats. UTF-8 encoded values are the least problematic way of transferring data in a heterogeneous system.
JSON is some sense solves the problems of arrays and dictionaries and maybe this is all that is needed. Otherwise perhaps protocol buffers are an option.
In the end, if going for a distributed system (aka microservice) architecture, there are many "off the shelf" solutions for the Python (or other programming language) end, what is needed is the support in the Chapel end. It sounds like 0MQ is a good way forward since that can support a multitude of communication protocols included the dreaded HTTP.
A huge advantage of this direction is that I could demo a local Python/matplotlib renderer for a computation on a Cray monster supercomputer at PyConUK 2018 in September. :-)
I am happy to trial stuff here.
We do have some JSON support but I don't think it's been developed in a way that works with ZMQ just yet (I did wonder if it would be useful, it does seem worth investigating when we are able to export classes/records).
Thanks :) I'll be sure to let you know when I get a full prototype generator up and running.
This repository has long been finicky in terms of the build setup. We are investigating an alternative strategy for Python interoperability, relying on Cython and ZMQ. For more details, check the Python Interoperability epic on the Chapel repository. We were content to let this repository languish and to make minor fixes as possible, but a recent update to the Chapel repository caused this one to break in a fashion that seems more difficult to fix. So, with regret, we are retiring it.