Gallopsled / pwntools

CTF framework and exploit development library
http://pwntools.com
Other
11.91k stars 1.69k forks source link

Feature: Async IO? #1236

Open ZackNoyes opened 5 years ago

ZackNoyes commented 5 years ago

It doesn't look like to me that there is any support within Pwntools for asynchronous IO (for example, being able to start 100 remote processes and deal with them at the same time, rather that one after the other). Since there is also no Python 3 support, the only way I can see this being done is through python modules like threading or multiprocessing. Just wondering, has this ever been considered for development or is there an easy way this can be achieved?

zachriggle commented 5 years ago

No, there's no asynchronous I/O support (i.e. you cannot set a callback for when data arrives).

I expect that it could be added relatively easy with threading, with some simple logic that polls can_recv and fires off a callback.

Probably the most efficient way to do this is to define a new class that inherits from pwnlib.tubes.tube.tube, that proxies all xxx_raw calls to some other tube object, and supports the callback registration like shown above.

class async_tube(tube):
    def __init__(self, child, callback):
        self.child = child
        self.callback = callback
        self.thread = ...

    # invoke callback whenever data is available
    def threadfunc(self):
        while self.callback(self, self.recv()):
            pass

    # proxy all xxxx_raw routines to the child object
    def recv_raw(*a, **kw):
        return self.child.recv_raw(*a, **kw)
    ...

You'd then use it something like:

def my_callback_func(tube, data):
    print('received', repr(data))
r = remote(host, port)
r = async_tube(r, my_callback_func)
r.wait_for_close()
ZackNoyes commented 5 years ago

Thanks, I'll look into that. The easiest way I've found so far is to just create a process pool through multiprocessing (or thread pool) and then use the Pool.map() to get an array of results for a particular call. This may not be the most efficient, but I think for most purposes it is good enough.

See this blog post.

zachriggle commented 5 years ago

The biggest restriction with using multiprocessing is that each tube's state will only be valid in the sub-process that it's running -- and only then if the tube isn't touched from the main process.

ZackNoyes commented 5 years ago

Does this apply even if a ThreadPool is used rather than a Pool of processes? (e.g. from multiprocessing.pool import ThreadPool)

zachriggle commented 5 years ago

It all depends on how the objects are instantiated. It's outside the scope of this issue to go into Python's limitations when threading or multiprocessing.

Ultimately, threading behaves the way most people expect, and multiprocessing has object state issues for objects passed from the parent process into the child process.

Anything Pwntools supports officially (i.e. if you intend to submit a pull request) should target the threading API first, and possibly support multiprocessing.dummy as a non-default option for performance (at the cost of many gotchas).

All of that said, this hasn't come up before, so perhaps you only need to hack a one-time solution rather than contribute something back to the core codebase. What are you actually trying to do / solve?

ZackNoyes commented 5 years ago

Yes okay, if other people would find value in this then I could look at contributing something, but otherwise I'll stick to my temporary solution. Which while not most efficient, is 'good enough', and easy to implement.

MrQubo commented 1 week ago

Upping the issue.

Nowadays we have async/await in python 3, which should be the way to go.

Arusekk commented 1 week ago

Would you be willing to work on that? Surely some (most?) of the implementation would work with functools.coroutines and py2-compatible syntax? Otherwise we can start working on pwntools 5 with all outstanding major cleanups.

MrQubo commented 1 week ago

@Arusekk I've added this to my TODO list, but I won't get time for it until October.

I never implemented async lib for python. Do you have some recommendations on what to use, so that:

  1. Current performance of sync version doesn't degrade;
  2. We don't have lots of separate logic for sync and async version;