FreeOpcUa / python-opcua

LGPL Pure Python OPC-UA Client and Server
http://freeopcua.github.io/
GNU Lesser General Public License v3.0
1.37k stars 660 forks source link

async client preview #109

Open huazh opened 8 years ago

huazh commented 8 years ago

https://github.com/huazh/python-opcua/commit/896dd171cf90b955c1eda0244742fb24c3752a02

Because of the major refactoring, I have not merged the head of master.

TODO: 1, remove KeepAlive thread 2, review security_policy carefully, it may contains race condition 3, attack subscription race condition

Suggestion is welcome.

oroulet commented 8 years ago

Looks like things got much more complicated than we thought!

oroulet commented 8 years ago

I am really not fan of all those:

def binclient_method(self, params):
    def new_req():
         xxxx
    def on_reasp():
        xxxx
return self._send_request(new_req, on_resp)

But I do not know how to avoid them... yet...:-)

oroulet commented 8 years ago

Can't we just not use them in most places? If we look at :+1:

def translate_browsepaths_to_nodeids(self, browsepaths):
    def new_req():
        request = ua.TranslateBrowsePathsToNodeIdsRequest()
        request.Parameters.BrowsePaths = browsepaths
        return request

    def on_resp(data):
        response = ua.TranslateBrowsePathsToNodeIdsResponse.from_binary(data)
        response.ResponseHeader.ServiceResult.check()
        return response.Results

    return self._send_request(new_req, on_resp)

Then I see nothing that need to be done in loop We could just write

def translate_browsepaths_to_nodeids(self, browsepaths):
        request = ua.TranslateBrowsePathsToNodeIdsRequest()
        request.Parameters.BrowsePaths = browsepaths
        response = self._send_request(request)
        response = ua.TranslateBrowsePathsToNodeIdsResponse.from_binary(data)
        response.ResponseHeader.ServiceResult.check()
        return response.Results

And only use the complex form when we really need it (create_session, create_secure_channel and create_subscription)

oroulet commented 8 years ago

or am I missing something?

oroulet commented 8 years ago

Btw, what about using decorators to send every method to loop as coro and have send_request use yield?

oroulet commented 8 years ago

Thinking about it. Leave the loop creation in Connect_socket. It is much easier for now

huazh commented 8 years ago

BinaryClient can not share the loop thread with others. The opcua client protocol is totally asynchronous. It means, if sharing the event loop, all the requests must return an asyncio.Future object. In contrast, if doesn't share the event loop, all the requests must return the result (or concurent.future?). I can not find a way to unify these two semantics without modify Client/Node/Subscription class.

In current commit, all the new_req() is not need. But it seems security_policy may be accessed simultaneously in Client and UAClientProtocol. So I chose to keep the new_req(). Another option is to remove it and make special case for some requests.

The on_resp() callbacks are keep there for implementing sharing loop BinaryClient() easily. If someone inherits BinaryClient and override _send_request to return an asyncio.Future instead of response, the BinaryClient is almost sharing loop ready. Then the users of BinaryClient can either add callback to the Future or use coro to get response.

I've thought about using yield and decorator. It can be written in two ways: 1, Using what asyncio/trollius provides. But if we decide to support these two packages, I can't use them. 2, Implementing our own yield/decorator. I've tried this way (if yield a Future, wait for the Future, otherwise consider it as the final result). But I found if I take care all the corner cases (especially the exception), what I implement is the half baked version of Trollius and make the code hard to read.

oroulet commented 8 years ago

The opcua client protocol is totally asynchronous. It means, if sharing the event loop, all the requests must return an asyncio.Future

Is it correct? yes all our request must return a future, but there is nothing that prevent to use loop for other things.

I think this coroutine stuff can really help us so I did some small tests. We cannot have both asyncio corouting and trollius corouting in same files but we only need coroutines at one place the send_request method so maybe something like this can work

try:
    import asyncio
    import as3
    hello_world = as3.hello_world
except ImportError:
    import trollius as asyncio
    import as2
    hello_world = as2.hello_world

loop = asyncio.get_event_loop()
loop.run_until_complete(hello_world())
loop.close()

with as2.py containing the trollius syntax

def hello_world():
      print("Hello World!")
     yield From(asyncio.sleep(1))
      print("finished World!")

and as3.py the asyncio sintax

def hello_world():
      print("Hello World!")
      yield from asyncio.sleep(1)
      print("finished World!")
oroulet commented 8 years ago

btw I found out the new shinny syntax for asyncio is:

async def hello_world():
     print("Hello World!")
     await asyncio.sleep(1)
     print("finished World!")
huazh commented 8 years ago

There are 2 problems:

def new_req(param):
  return req
def on_resp(data):
  return resp

as2.py:

def read(param):
  req = new_req(param)
  data = yield From(send_request(req))
  raise Return(on_resp(data))

as3.py:

def read(param):
  req = new_req(param)
  data = yield from send_request(req)
  return on_resp(data)
oroulet commented 8 years ago

The following seems to work for me, isnt'it OK? absolutely everything is ran in loop so it should be no race condition and both methods return the same object with python 2 and 3 as2.py

def send_request(request):
    yield From(asyncio.sleep(1))
    raise Return( request * 2)

as3.py

async def send_request(request):
    await asyncio.sleep(1)
     return request * 2

main

from opcua.common.utils import ThreadLoop
#from IPython import embed

def ua_async(func):
    def wrapper(val):
        return l.run_coro_and_wait(func(val))
    return wrapper

@ua_async
def read(val):
    params = val * 2
    val = send_request(params)
    return val

if __name__ == "__main__":
    l = ThreadLoop()
    try:
        l.start()
        read(5.45)
        time.sleep(3)
    finally:
        print("Stopping loop")
    l.stop()
oroulet commented 8 years ago

of course you need to replace "await asyncio.sleep" with "await myfuture", but that should work. You probably need to use future object from asyncio and not the one from concurrent

huazh commented 8 years ago
@ua_async
def read(val):
    params = val * 2
    val = send_request(params)
    return val

In read(), you can only directly return what send_request return. Because send_reqeust() is a coroutine, not read(). You can't post process the val in any way, a callback style must be used to post process the actual data.

oroulet commented 8 years ago

you are right we can't :-( , the obvious solution is to make read an async method.... but then we have different syntax for trollius and asyncio...

I do not know what to do further with client, async framework can remove some locking but obviously makes every method more complex with callbacks everywhere while new async syntax is very nice but totally not python2 compatible:

async def read_async(val):
      params = val * 2
      val = await send_request(params)
      return val * 2

 @ua_sync
  def read(val):
       return read_async(val)

Maybe we can live with 2 versions of BinaryClient? One for python2 and one for python3 with pure asyncio?

Then we have the common classes like Node and Subscriptions they also use locks and could use a loop instead but then we get to the same issue with python2 support... If you have any idea without havinf callbacks everywhere let me know

as far as I know there is no real bug in client but a race condition because server can send notification before we have handled the response, This bug is very theoretical, I have never seen it happen in real life, it does happen in tests because I made it happen by will. asyncio will not solve it, it requires a queue of notifications and added complexity, not sure it is worth it

huazh commented 8 years ago

The race condition in subscription is not theoretical. For example, if the client more than one subscription, there are almost certain server has many publish requests in queue. Then if a new create monitored item is requested, the server can publish result with new items very quickly. And the python GIL make the case even worse, the recv thread may have processed create monitored items response and publish response with the user thread is locked with GIL. The first publish response after create monitored items is very important for the client, it contains the current value of the node. If lost, client may have no chance to learn the value of the node. Not only the client, the server suffers from the problem too. It happens occasionally, but it is there.

I'm facing another problem with current client. In my program, I need to exchange information between opcua servers. It means I need to create a writer thread for every clients except recv thread and keepalive thread. And every subscription notification must be handle very quickly without blocking in recv thread then transfer to corresponding writer thread carefully. In simple situation, it is still manageable. But if exception raises, the situation becomes a disaster to handle. The current status of the opcua client may be enough for a simple read/write program, but not enough for a complicate usage.

As for avoiding callback style, the options in my minds are:

  1. Drop support for asyncio, use trollius
  2. Implementing our own version of coroutine. But correctly handling exception in coroutine in python is tricky. In a simple implantation, it's hard to debug program because traceback is a messy.
oroulet commented 8 years ago

Do you use python2 or python3? We could also just fork python-opcua and make a python3 only version. I do not like the generator api but the new async await api is very nice. Python2 is dying anyway, even pypy has now a python3 version,

huazh commented 8 years ago

I have not decided which to use yet. await/async is a feature of Python3.5. I don't think it's a good idea to create a library with python3.5 only. Pypy have not put their focus in python3 version for now. Another sad news is trollius doesn't support on pypy windows version.

oroulet commented 8 years ago

I don't think it's a good idea to create a library with python3.5

if we drop python 2.7 then supporting python > 3.5 is not problem in one year all python3 users are on 3.5. But dropping 2.7 is a bit hard for me... I am afraid to encounter it...

huazh commented 8 years ago

There is another way to fix the gap between python3/2. We can rewrite the package in setup.py, translate py3 asyncio to py2 trollius in the fly.

oroulet commented 8 years ago

Some people have tried, it is difficult because of the raise Return stuff,... But maybe...

huazh commented 8 years ago

If we limit coroutine syntax, it could be done. For example, we can forbidden usage like:

def func():
  yield from ....
  try:
  ...
  return something
  except:
  ...

Automatically translation doesn't mean we need support all syntax in coroutine. We still need take care the syntax different of them. But even limit usage of coroutine is big advantage. To avoid unintentionally rewriting, we can limit rewriting in block labeled by comment like:

# START_COROUTINE_REWRITE
# END_COROUTINE_REWRITE
oroulet commented 8 years ago

Do you give it a try? with BinaryClient class based on your current work? use async and await and no label and we will see if we manage to make a script to convert it to trollius or if we need labels

every ua method can then have an async version ending with _async and sync version which is compatible with current code and is just the async method with a decorator

huazh commented 8 years ago

I'm trying. Hopefully tonight.

oroulet commented 8 years ago

great! just do the asyncio code, we can have a look at the convertion to trollius after the code is working.

I also need to do an in-depth review, I see there are some some strange things happening when several people, not knowing the entire code base, send patches...

huazh commented 8 years ago

https://github.com/FreeOpcUa/python-opcua/commit/c34112d043319cfcfbbe2a041c932ce1c744aea6

I have not written the auto translate script. But it's very simple to do for now. I'm using trollius syntax, just replace yield From -> yield from and raise Return -> return in binary_client.py, it will be asyncio compatible. There are two binary client classes: AsyncBinaryClient is the asyncio version, BinaryClient is sync version. An uaasync.py is added to common dir. All AsyncBinaryClients share the loop provided by uaasync.

uaasync.py has a function install_loop(loop). It must be called before using any other code. If called, uaasync will use the loop as default loop and will not create additional loop thread. It also means the sync version BinaryClient can not be used anymore. It is user's responsibility to schedule AsyncBinaryClient.

oroulet commented 8 years ago

This starts to look OK.

1 -- It still think we should use the new async/away syntax in main code and convert to trollius. The new syntax is much more intuitive and less hacky (No useless decorator for example).

2-- Did you loot at the ThreadLoop class in utils.py? Looks like your code re-implement some stuff there

3-- > @await_super_coro => why not @uasync

4--

  @await_super_coro
   def read(self, parameters):
        pass

Does that work? Don't we override async method with 'pass'? I am not sure about spliiting async and sync API, maybe some users want both?

btw the name BinaryClient is wrong, it is inherited from C++ where the binary protocol was hard-coded. Here the code will be exactly the same for xml implementation we will just call to_xml() instead of to_binary() at the lowest socket write level. Maybe we could just rename it to UaClient? @huazh @alkor @destogl

oroulet commented 8 years ago

btw @huazh . why is Protocol split into 2 classes now? Is it because you think you can reuse part of it in server? then it should really be merge with ThreadPool

huazh commented 8 years ago

Trollius syntax is more limited. Rewriting from yield From() to yield from is simpler. A yield From() statement is always yield from, but the opposite is not always true (any function can use yield from, it may not be a coroutine, rewrite it to yield From is not correct). The situation for raise Return() is the same as yield. As I am testing the async version client, I found I may be over optimistic about asyncio/trollius in data race condition. There is a subtle difference in coro scheduling between trollius and asyncio. In Trollius, if a coro yield From(future/coro) or a function using future.add_done_callback, while the result is available, the coro/callback is scheduled using call_soon() to append to loop’s runnable queue. On the other hand, in asyncio, yield from future or future.add_done_callback() is the same as trollius, but the code after yield from coro is run immediately after the waiting coro is finished. From race condition’s point of view, asyncio’s scheduling is more friendly as we can control the wakeup sequence of the waiting coro. But the asyncio PEP doesn’t specify this scheduling algorithm, it may be changed in the future. The difference can be shown in code below:

Trollius:

import trollius as asyncio
from trollius import From, Return

fut1 = asyncio.Future()
fut2 = asyncio.Future()

@asyncio.coroutine
def coro1():
    print("coro1 start")
    x = yield From(fut1)
    print("coro1 end")
    raise Return(x)

@asyncio.coroutine
def coro2():
    print("coro2 start")
    x = yield From(coro1())
    print("coro2 end")
    raise Return(x)

def cb(fut):
    print("cb start")
    x = fut.result()
    print("cb end")

def trig():
    print("set fut1")
    fut1.set_result(1)
    print("set fut2")
    fut2.set_result(1)
    print("trig done")
    loop.call_later(1, loop.stop)

loop = asyncio.get_event_loop()
asyncio.async(coro2())
fut2.add_done_callback(cb)
loop.call_later(1, trig)
loop.run_forever()
loop.close()

Asyncio:

import asyncio

fut1 = asyncio.Future()
fut2 = asyncio.Future()

@asyncio.coroutine
def coro1():
    print("coro1 start")
    x = yield from fut1
    print("coro1 end")
    return x

@asyncio.coroutine
def coro2():
    print("coro2 start")
    x = yield from coro1()
    print("coro2 end")
    return x

def cb(fut):
    print("cb start")
    x = fut.result()
    print("cb end")

def trig():
    print("set fut1")
    fut1.set_result(1)
    print("set fut2")
    fut2.set_result(1)
    print("trig done")
    loop.call_later(1, loop.stop)

loop = asyncio.get_event_loop()
asyncio.async(coro2())
fut2.add_done_callback(cb)
loop.call_later(1, trig)
loop.run_forever()
loop.close()

If trollius is supported, we need found a way to serialized the response process.

huazh commented 8 years ago

Yes, I knew the ThreadLoop class. Some code is borrowed from there. But uaasync is served for different idea. 1. uaasync.get_loop() is the single global loop for whole package, we don't need pass the loop all over place. 2. can install a external loop from user. 3. May or may not have a loop thread, depends on whether the loop is external, but code using uaasync should not depend on it. If someday we rewrite Server to async version, the ThreadLoop could be striped away.

There are two decorators provide by uaasync: await_super_coro and await_super_call. They must be used to decorate method of class. While the method is called, the actually called method is the one from super class, but transfer to thread loop. The method decorated is not called and is only provided as a name indicator. await_super_coro to decorate coroutine, await_super_call to decorate function.

I'm not fully understand what you mean about 2 protocol. It's only one protocol for client. And it's not compatible with the server one.

huazh commented 8 years ago

Async and sync API is not the same thing. It's not a good idea to mix them. In fact, I prefer providing async and sync version API on Client/Server class, rewrite BinaryClient and InternalServer/Session to async only.

oroulet commented 8 years ago

I'm not fully understand what you mean about 2 protocol

you have 2 classes UAClientProtocol and uaasync.Protoco. I wondered why you did split the class and if we could merge some stuff with ThreadLoop

There is a subtle difference in coro scheduling between trollius and asyncio

Yes it looks likes we cannot rely on asyncio behaviour here.

I let you look at this further

jk987 commented 8 years ago

Re: huazh

I don't catch the whole discussion, but this one interests me:

... The race condition in subscription is not theoretical. For example, if the client more than one subscription, there are almost certain server has many publish requests in queue. Then if a new create monitored item is requested, the server can publish result with new items very quickly. And the python GIL make the case even worse, the recv thread may have processed create monitored items response and publish response with the user thread is locked with GIL. The first publish response after create monitored items is very important for the client, it contains the current value of the node. If lost, client may have no chance to learn the value of the node. Not only the client, the server suffers from the problem too. It happens occasionally, but it is there. ...

I slightly remember that couple months ago I tested creating dozens of monitored items (and handling PublishRequest/PublishResponse during the same time) and that we succesfully solved this collision. I found it was issue #46.

huazh commented 8 years ago

Hi @jk987 If you run the test, you will find some warning logging like: Received a notification for unknown handle: 220 It means create monitored items response is processed after notification processing. It is more easily triggered by subscription with very short sampling interval. If you insert time.sleep(1) before with self._lock: in method Subscription.create_monitored_items(), it will happen almost always.

oroulet commented 8 years ago

What is the status on this one? Please do no send one big patch trying to solve everything. One thing at a time.

huazh commented 8 years ago

Sorry for the long pause in the development. I'm currently busy in work as the longest holiday in my country is coming. I'll update it in the holiday. I'll try my best to avoid big patch.

张华 Zhang, Hua

On Thu, Jan 28, 2016 at 2:39 PM, ORD notifications@github.com wrote:

What is the status on this one? Please do no send one big patch trying to solve everything. One thing at a time.

— Reply to this email directly or view it on GitHub https://github.com/FreeOpcUa/python-opcua/issues/109#issuecomment-176011655 .

oroulet commented 8 years ago

No pb :-) this issue is more of a long term thinking

huazh commented 8 years ago

https://github.com/huazh/python-opcua/commit/8e3c7882a72da4c0d14d8a901b53d443e2030ee5

These commits solved the problem of trollius and asyncio schedule difference and add simple on-the-fly translate in setup.py.

oroulet commented 8 years ago

I really like most of this. But I really would prefer to maintain code based on async/await than the hacky trollius syntax. Can you send a PR written using async/await? Then I promise to write a script to convert to trollius :-) It should not be very hard. The only tricky part is that return -> return Raise convertion must only be done inside a coroutine. Maybe we can also write a script to generate sync class from async automatically?

huazh commented 8 years ago

It's very easy for me to rewrite using async/await. But I think it's very tricky to translate to trollius syntax. Considering following example:

async def f():
  def f2():
     return 1
  await something
  return f2()

And we can not use other powerful async syntax such as async with/ async for. It means async/await is just a syntax sugar, but add complexity in translating job.

I have thought about using metaclass to generating sync class from async class. But I choose simply write a new class at the end. Because 1. it's only one class; 2. it make code easy to read; 3. it's easy to make some customization; 4. it's easy to add documentation

oroulet commented 8 years ago

Maybe I am frivolous here, but I had a hard time understanding the yield syntax while the async await is obvious to me. Isn't your example quite artificial? If we have it at one place then we can just write a hack or decorator?

On Thu, Feb 18, 2016, 17:19 Zhang, Hua notifications@github.com wrote:

It's very easy for me to rewrite using async/await. But I think it's very tricky to translate to trollius syntax. Considering following example:

async def f(): def f2(): return 1 await something return f2()

And we can not use other powerful async syntax such as async with/ async for. It means async/await is just a syntax sugar, but add complexity in translating job.

I have thought about using metaclass to generating sync class from async class. But I choose simply write a new class at the end. Because 1. it's only one class; 2. it make code easy to read; 3. it's easy to make some customization; 4. it's easy to add documentation

— Reply to this email directly or view it on GitHub https://github.com/FreeOpcUa/python-opcua/issues/109#issuecomment-185798853 .

huazh commented 8 years ago

Nested function/class is a powerful tool in Python. It's not wise to limit it's usage.

Here is the await syntax: https://github.com/huazh/python-opcua/commit/024e0d921e54ab2e8daa50cf4d4c380e12fd8986 You could try to write a translator to yield from of py3.3-py3.4 and yield From of py2.7. Without the translator, it's not a complete patch. So I don't create a PR.

oroulet commented 8 years ago

Seems trollius is dead: http://trollius.readthedocs.org/deprecated.html#deprecated

So we should really not base our work on trollius!!

I am wondering about what to do. python 2.7 is still the only python version in several industrial systems I use. maybe releasing a 1.0 version working with python 2.7 then remove support for python < 3.5 in master........

zerox1212 commented 8 years ago

Just as an FYI with regards to only supporting python 3.5+. From what I saw PyQt 4 doesn't work with python 3.5, only 3.4 is supported.

oroulet commented 8 years ago

I use pyqt5 and python5 on Ubuntu.. But this async stuff is tricky..

On Thu, Mar 31, 2016, 16:19 Andrew notifications@github.com wrote:

Just as an FYI with regards to only supporting python 3.5+. From what I saw PyQt 4 doesn't work with python 3.5, only 3.4 is supported.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/FreeOpcUa/python-opcua/issues/109#issuecomment-203959679

Rosepeter commented 6 years ago

Is the plan to provide an asynchronous client still followed? I would really appreciate an asynchronous API here! In the mean-time I simply use the low-level UASocketClient to communicate with a server in an asynchronous fashion. The only downside is, that one does not get any notifications if the server dies in this case, as polling the server state in the KeepAlive-thread simply times out and no one is informed about that.

oroulet commented 6 years ago

@Rosepeter there you found the correct pr. We had a final issue with limitations of asyncio in python2. It is anyway the correct way to solve the issue. Looks like nobody is working on it for now. But any help is welcome

oroulet commented 6 years ago

Python2 is now dying so a Python 3 only fork is a possible solution

cbergmiller commented 6 years ago

I am working on an diverging fork of python-opcua that uses asyncio. It will support Python 3.6 and above. The client is mostly done, the server still needs some work. Consider it experimental at the moment although i use the client in production at work. Feel free to comment and contribute! https://github.com/cbergmiller/python-opcua/tree/make/asyncio

oroulet commented 6 years ago

Great! I will have a look

oroulet commented 6 years ago

@cbergmiller Could create a pull request? It will probably never be merged but it makes tracking change easier. I had a short look but there are many changes..... A fundamental point is that in the original async work from @huazh it was still possible to use the library in sync using some wrapper classes and decorators. I could not see such a thing in your work. Have you been thinking at it? The issues is that most people do not use async (In many applications it is not a necessity) and it would be great to keep only one code base.... Otherwise everything I could see looks correct...