MonetDB / pymonetdb

The Python API for MonetDB
https://www.monetdb.org/
Mozilla Public License 2.0
28 stars 20 forks source link

Add MonetDB 10 protocol support #54

Closed gijzelaerr closed 2 years ago

gijzelaerr commented 5 years ago

Described in this paper: http://www.vldb.org/pvldb/vol10/p1022-muehleisen.pdf

This fork implements the changes https://github.com/gijzelaerr/pymonetdb/compare/master...Mytherin:master

joerivanruth commented 2 years ago

MAPI 10 support has been removed from the server side so we will never merge this

gijzelaerr commented 2 years ago

ah really? why? it was supposed to be faster right (since binary)

joerivanruth commented 2 years ago

It was a good idea but as far as i know the implementation was never completed beyond the proof of concept level. No clients used it, not even mclient/libmapi. The server side implementation copy pasted a lot of existing code and left modified duplicates that were not back-integrated with the original. Over the years the code started to rot and at some point it was decided to remove it. Some of it has been removed, some of it still lingers. Since then there has been a new development, "COLUMNAR_MODE" that also passes binary data. IIRC it is used in some kind of server-server communication (but not REMOTE TABLE, that is another barrel of fish). I think @aris-koning knows more about it. We have been talking about adding support for it to client libraries but one of the blockers is that it does not support partial transfers (reply_size) and i don't know if and how it supports the more esoteric data types.

aris-koning commented 2 years ago

Just to give a bit more background: the implementation of remote tables is based on the remote MAL module. That module offers a MAL interface to create server-to-server connection, move around binary BAT files and MAL programs between the servers where the latter can be executed as a remote procedures. The original use case for this module is indeed the REMOTE TABLE feature. However I have done some work on the remote module, by having it more integrated into the MAPI protocol where it is accessible under the columnar_protocol flag:

MapiMsg
mapi_set_columnar_protocol(Mapi mid, bool columnar_protocol)
{
    if (mid->columnar_protocol == columnar_protocol)
        return MOK;
    mid->columnar_protocol = columnar_protocol;
    if (!mid->connected)
        return MOK;
    if (columnar_protocol)
        return mapi_Xcommand(mid, "columnar_protocol", "1");
    else
        return mapi_Xcommand(mid, "columnar_protocol", "0");
}

I have used PROTOCOL_COLUMNAR to implement the first version of remote connection in monetdbe. I choose to base this on the remote module as PROTOCOL_10 did not seem to be as complete or used in the core monetdbe code base. While the remote module was already way better tested because of its use in the implementation of REMOTE TABLE.

But PROTOCOL_COLUMNAR also requires some scrutiny and probably some improvement:

For once it does serialize vheap based BAT files like string columns. Also the interaction with the MAPI replay feature needs some attention. And there is probably more stuff if we dig deeper. But I think it first requires a use case before we go deeper there. Maybe better communication performance in pymonetdb would be a nice use case. If so, we might want to change the title of this issue.