MDSplus / mdsplus

The MDSplus data management system
https://mdsplus.org/
Other
72 stars 44 forks source link

object oriented thin client api in python #1074

Closed zack-vii closed 6 years ago

zack-vii commented 7 years ago

This issue is dedicated to the object oriented thin client implementation in python.

To begin the discussion I will spin some ideas: I propose a new class remote that can be passed/shared between Data objects containing the remote information.

The field Data.ctx can become a "remote" object which contains all information about the connection.

ctx.connection  = Connection(host)  # the Connection object
ctx.id                  = #unique_identifier
@property
def variable(self): return "_py_%08x"%self.id   #  the variable name that contains the remote representation of the Data object.

local object is lightweight and contains only information about the remote variable. Data is only pulled on request. e.g. via OBJ.data()

ALTERNATIVELY

a RemoteData object that is not an instance of Data but as general. The object contains all information of the remote connection and is linked by remote tdi var. all operations are evaluated remotely and can be pulled via a method get() which would serialize the remote data and deserialize it locally into an MDSplus.Data type.

joshStillerman commented 7 years ago

Timo - I am glad this idea interests you. I looked into PYRO https://pythonhosted.org/Pyro4/ before the September meeting. At first I thought I might be able to just hand them our objects and let them make them remote. No such luck. Digging in a bit, I find that they recommend creating a set of proxy objects on the client side, connected to 'real' objects on the server side. In the case of Pyro it is not clear of either side can / should be the originals.

Reading your message, I am not sure how this would interact on each end of the connection. We have (I think) three kinds of objects: Tree, Data, and TreeNode (and Connection). I think in the end we will want to be able to create any of these three remotely. This could be Connection.Tree .... or Tree.ctx.connection ...

Thoughts ?

-Josh

tfredian commented 6 years ago

There are lots of issues in attempting this and as far as I can tell they are not addressed in Gabriele's c++ remote objects work either. One of the most difficult problems in effectively implementing a thin client implementation of our objects interface is things stored in "Compound" instances. One could imagine a complete set of proxy classes that refer to the real objects on the server end and all methods/properties of the proxy classes get evaluated on the server side and return proxy class instances unless the value of the method or property is a python primitive object or a numpy instance. For example:

val=connection.execute('build_with_units(1 : 100,"volts")') dim=connection.execute('build_dimension(....)') sig=connection.Signal(val,None,dim)

val, dim and sig would be proxy objects referencing objects on the server side. sig would be a proxy Signal instance and in this case would have a proxy Range object with potentially proxy units. If you then did:

y=sig.value

you would get back a proxy Range object. If you then did:

y.data() you would get back a local numpy array.

Similarly if you did:

sig=connection.Signal(numpy-array,None,numpy-array)

The client would need to send the arrays to the server and sig would be a proxy instance referring to the signal on the server.

Then if you did:

sig = sig + 42

it would need to tell the server to perform the operation and it would would return a new proxy object.

All of the proxy objects would need to have del methods which would clean up the corresponding objects on the server side.

The implementation of the proxy objects would require considerable development I believe. It would be possible to have a set of python tdi fun's which held onto a global list of python object instances and return some special answer that the client could interpret as a python instance on the server identifier.

Unless I'm mistaken this would be a major project just to implement remote objects support and just as difficult, if not more, to implement this is any of the other languages.

One big issue with such an approach is how the user controls the object environment. What I described above is not very useful (i.e. sig.connection.Signal()) since this is an explicit thin client reference in the code. The main driving force for thin client proxy objects is to enable having user code which is essentially identical whether you are using local/distributed client mode or thin client mode. This would imply some state setting which would be in effect when you did x=Signal(...). For example, setConnection(connection) or setConnection(None).

Another big consideration is whether there are significant performance issues which may be particular to the analysis being performed. In some cases the applications may need to be optimized (by the user) depending on whether they are connecting to a remote site or analyzing local data. A general solution that removes the need to use different user code for local vs thin client is a nice concept but I could envision the many applications would perform better if the author optimized the code for local or thin client operation.

Sorry to rattle on so much about this but we've discussed this over and over for many years (probably related discussions even predating the object oriented enhancements) and thought about some of these issues and then nothing got done (perhaps for good reason).

GabrieleManduchi commented 6 years ago

Hi Tom, the current C++ thin client TreeNode does not cover at all the functionality you are mentioning and that, you are right!!, would require a really big effort (worth? I don't think so). The current implementation provides a subset of functionality sometimes different:

1) only a subset of TreeNode methods have been implemented, i.e. only those that can be implemented with remote expression evaluation 2) There is no managed tree context. A Tree is open with connection.openTree() and then it represents the static context for that connection (old fashioned MDSplus approach, adopted by more than 90% of the users) 3) method getData() has a different semantics of the original TreeNode.getData(), and it returns the evaluation of the data content of that tree node, i.e. scalars or arrays, as supported by thin client

Having said so, one may wonder why implementing TreeNodeThinClient. I think because of the following reasons, triggered by user requirements

1) In most cases the thin client functionality covers all what users need, getting data. On a high latency communication, performance can be greatly improved. This however does not add anything to the direct use of Connection.get(), but makes the program interface more uniform.

2) Handling segments: it turns out that many users (especially using LabVIEW) need to write segmented data via thin client connection.

Observe that all the above can be done using Connection, and indeed TreeNodeThinClient is basically a wrapper for Connection, but I am convinced they can improve the quality of the interface (hiding all the tricks required to make TDI funs work). On LabVIEW, this makes thing very clean.

                                                         Ciao

Gabriele

On 18/10/2017 21:00, Tom Fredian wrote:

There are lots of issues in attempting this and as far as I can tell they are not addressed in Gabriele's c++ remote objects work either. One of the most difficult problems in effectively implementing a thin client implementation of our objects interface is things stored in "Compound" instances. One could imagine a complete set of proxy classes that refer to the real objects on the server end and all methods/properties of the proxy classes get evaluated on the server side and return proxy class instances unless the value of the method or property is a python primitive object or a numpy instance. For example:

val=connection.execute('build_with_units(1 : 100,"volts")') dim=connection.execute('build_dimension(....)') sig=connection.Signal(val,None,dim)

val, dim and sig would be proxy objects referencing objects on the server side. sig would be a proxy Signal instance and in this case would have a proxy Range object with potentially proxy units. If you then did:

y=sig.value

you would get back a proxy Range object. If you then did:

y.data() you would get back a local numpy array.

Similarly if you did:

sig=connection.Signal(numpy-array,None,numpy-array)

The client would need to send the arrays to the server and sig would be a proxy instance referring to the signal on the server.

Then if you did:

sig = sig + 42

it would need to tell the server to perform the operation and it would would return a new proxy object.

All of the proxy objects would need to have del methods which would clean up the corresponding objects on the server side.

The implementation of the proxy objects would require considerable development I believe. It would be possible to have a set of python tdi fun's which held onto a global list of python object instances and return some special answer that the client could interpret as a python instance on the server identifier.

Unless I'm mistaken this would be a major project just to implement remote objects support and just as difficult, if not more, to implement this is any of the other languages.

One big issue with such an approach is how the user controls the object environment. What I described above is not very useful (i.e. sig.connection.Signal()) since this is an explicit thin client reference in the code. The main driving force for thin client proxy objects is to enable having user code which is essentially identical whether you are using local/distributed client mode or thin client mode. This would imply some state setting which would be in effect when you did x=Signal(...). For example, setConnection(connection) or setConnection(None).

Another big consideration is whether there are significant performance issues which may be particular to the analysis being performed. In some cases the applications may need to be optimized (by the user) depending on whether they are connecting to a remote site or analyzing local data. A general solution that removes the need to use different user code for local vs thin client is a nice concept but I could envision the many applications would perform better if the author optimized the code for local or thin client operation.

Sorry to rattle on so much about this but we've discussed this over and over for many years (probably related discussions even predating the object oriented enhancements) and thought about some of these issues and then nothing got done (perhaps for good reason).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MDSplus/mdsplus/issues/1074#issuecomment-337694400, or mute the thread https://github.com/notifications/unsubscribe-auth/AISySM7J0tMUd0aK0njbsIiXG7cPFOHvks5stkrDgaJpZM4Py0bu.

-- Gabriele Manduchi

Istituto Gas Ionizzati del CNR Consorzio RFX - Associazione EURATOM/ENEA sulla Fusione Corso Stati Uniti 4, 35127 Padova - Italy ph +39-049-829-5039/-5000 fax +39-049-8700718 mailto:gabriele.manduchi@igi.cnr.it, http://www.igi.cnr.it

tfredian commented 6 years ago

Gabriele, I agree that adding more functionality to the connection object to simplify various common user operations like writing segments is a good idea and fairly easy to implement. Some of the discussion however was about providing a complete and compatible object interface which enable a user to take their application written entirely using the object interface and simply add an mdsip connect and have the same application work efficiently by retaining real object instances on the server and doing all the transaction laden data manipulations locally on the server. And as we both mentioned, this would likely be a major development job with potentially a questionable return on that investment. If you add either a tutorial or reference document of all the new connection methods we could further discuss what people think is the path forward, proxy objects or enhanced connection object, for all object oriented languages we support. It would not surprise me that you already provided that documentation and I have just forgotten!

joshStillerman commented 6 years ago

I found rpyc it seems to work (example below) we would just need to decide if we should package it, or just document it. For now the numpy objects returned do not work correctly. I put an issue into their github for it. In the meanwhile there is a utility function called obtain that retrieves a native object from the remote side.

$ python
Python 2.7.5 (default, May  3 2017, 07:55:04)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from rpyc.utils.zerodeploy import DeployedServer
>>> from plumbum import SshMachine
>>> import rpyc as r
>>> mach = SshMachine('cmodws118')
>>> server = DeployedServer(mach)
>>> c = server.classic_connect()
>>> m = c.modules['MDSplus']
>>> t = m.Tree('cmod', 1090909010)
>>> t
Tree("CMOD",1090909010,"Normal")
>>> ip = t._IP
>>> ip
\MAGNETICS::IP
>>> dd = r.utils.classic.obtain(ip.data())
>>> type(dd)
<type 'numpy.ndarray'>
>>> dd
array([-937.37731934, -313.14221191, -624.23510742, ..., 996.80712891,
        683.66491699,  683.66491699], dtype=float32)
>>>

All that is needed to make this work is: On the client computer: the rpyc python package, plumbum and numpy On the server computer: MDSplus and access to the trees

I am not sure where we would put the code but, if the object returned is a subclass of a non MDSplus type we could 'obtain' it.

tfredian commented 6 years ago

We now have a working remote object python interface which enables you to use MDSplus objects to remotely access an MDSplus server via ssh without having any of the standard MDSplus software installed on the client machine. To use this remote object python interface install using:

pip install mdsconnector [--user]

'''

from mdsconnector import mdsConnector c = mdsConnector('host') ### optional arguments for username, password, ssh-key etc) t = c.Tree('tree-name',shot-num) node = t.getNode('node-path') data = node.record.data() '''

The full object interface to MDSplus is supported using this module. Data remains on the server until a function/method is used which returns a native python data type (str, int, float) or a numpy data type (numpy array and scalars). If you want to transmit data from the client to the server you can do so by passing a native python data type or numpy data type to a MDSplus function or method.

The source code for the mdsconnector module can be viewed at:

https://github.com/MDSplus/mdsConnector