JohnDMcMaster / usbrply

Replay USB messages from Wireshark (.cap) files
ISC License
303 stars 37 forks source link

python2 vs python3 support? #9

Closed JohnDMcMaster closed 4 years ago

JohnDMcMaster commented 5 years ago

Python2 is actually approaching EOL. Consider converting to python3

krasin commented 4 years ago

Theoretically, https://github.com/krasin/usbrply/commit/1c34b920e27415256806f47030b82d66cb68bea0 should fix this issue. I am very conflicted about it, because the replacement library is 5 times slower (due to python-pcapng being a pure-Python implementation).

My commit might serve those who are desperate for Python 3 support, but I would suggest us to take some time to consider options.

One direction I would like to explore is to try to speed up python-pcapng library. While I don't expect that native speed could be achieved, it's theoretically possible to reduce the gap.

Another possibility is to bring pylibpcap back to life. It does not seem infeasible, but will probably require to contact the original team.

It's also possible that I missed some other Python library capable of reading pcapng files.

krasin commented 4 years ago

Or maybe I am overthinking it. I don't know.

JohnDMcMaster commented 4 years ago

Do you have some performance numbers? For your typical dumps is it slow enough where its noticeable? I will also note though that I often process video stream pcaps which could become problematic

krasin commented 4 years ago

On my dumps (not too large) the times went from 4.5 seconds to ~24 seconds. It caused me to get from "oh, it's a bit slow" to "are we there yet?" state of mind, and it's usually not a good sign. That's why while technically my patch implements Python 3 support, I don't think it should be merged.

Higher-level question: I saw that you had (and then deleted) a C version of usbrply. Was the primary issue the hassle to compile it for Windows? Or something else?

JohnDMcMaster commented 4 years ago

Thanks for the timing info!

Eh well IMHO C was just overkill for this project. The Python code was fast enough and much more pleasant to hack on

With a goal though to make the JSON intermediate representation canonical, I might be okay with keeping the high level generation logic in Python, and bringing back simplified C code that converts pcap to .json (but doesn't do any code generation). Then python would read in the JSON and print things as needed. Some flexibility would be lost, but any issues could probably be resolved by adding additional JSON metadata

krasin commented 4 years ago

@JohnDMcMaster I have found a way how to parse pcapng to the full extent needed for usbrply, with no dependency on libpcap and with just about 10% slowdown of total usbrply runtime. But I know how to speed up usbrply by 20%, so pretty sure we can come out even after the change.

This is the way:

# Fast pcapng format parser. Focused on being fast and only doing what we need.                                                     
# Only supports Enhanced Packet Blocks, as it's what being used in usbrply.                                                         
# See more: https://github.com/pcapng/pcapng/                                                                                       
def loop_pcapng(fin, loop_fn):
    ts_resolution = 1e-6
    fmt = '<IIIIII'

    with open(fin, "r+b") as f:
    mm = mmap.mmap(f.fileno(), 0)
        file_len = mm.size()
        i = 0
    cur = mm
        while i < file_len:
            block_typ, block_len, magic, ts_hi, ts_lo, cap_len = struct.unpack(fmt, cur[i:i+24])
            if cur[i:i+4] == 0x0A0D0D0A: # Section Header Block                                                                     
                if magic == 0x1A2B3C4D:
                    fmt = '<IIIIII' # little-endian                                                                                 
                else:
                    fmt = '>IIIIII' # big-endian                                                                                    
                # Update values based on the new endianness.                                                                        
                block_typ, block_len, magic, ts_hi, ts_lo, cap_len = struct.unpack(fmt, cur[i:i+24])
            if block_typ == 0x00000006:
                # Enhanced Packet Block. This is where we get USB data from.                                                        
                packet_data = cur[i+28:i+28+cap_len]
                ts = ((ts_hi << 32) + ts_lo) * ts_resolution
                loop_fn(cap_len, packet_data, ts)
            i += block_len

It makes a few reasonable assumptions based on the fact that we only use this parser for usbrply:

  1. It only looks into EnhancedPacket blocks. This is where wireshark puts USB data anyway. The same assumption is in my patch that uses python-pcapng library.
  2. It only works with well-formed pcapng files. If we want to deal with truncated dump files, we would just add a single if line into the loop to avoid crashing.
  3. It assumes that timestamp resolution is 1 ms, which is true for Wireshark + usbmon. But if needed, it will be ~12 lines of code to parse if_tsresol option in Interface Statistics Block.

I don't think that I made any other simplifying assumptions. The code is short, because pcapng format is simple & clean. Plus we only need to comprehend a tiny subset.

Let me know what you think.

JohnDMcMaster commented 4 years ago

I've merged your smaller PR. Let me think this over

On Sat, Mar 14, 2020 at 9:14 PM Ivan Krasin notifications@github.com wrote:

@JohnDMcMaster https://github.com/JohnDMcMaster I have found a way how to parse pcapng to the full extent needed for usbrply, with no dependency on libpcap and with just about 10% slowdown of total usbrply runtime. But I know how to speed up usbrply by 20%, so pretty sure we can come out even after the change.

This is the way:

Fast pcapng format parser. Focused on being fast and only doing what we need.

Only supports Enhanced Packet Blocks, as it's what being used in usbrply.

See more: https://github.com/pcapng/pcapng/

def loop_pcapng(fin, loop_fn): ts_resolution = 1e-6 fmt = '<IIIIII'

with open(fin, "r+b") as f:

mm = mmap.mmap(f.fileno(), 0) file_len = mm.size() i = 0 cur = mm while i < file_len: block_typ, block_len, magic, ts_hi, ts_lo, cap_len = struct.unpack(fmt, cur[i:i+24]) if cur[i:i+4] == 0x0A0D0D0A: # Section Header Block if magic == 0x1A2B3C4D: fmt = '<IIIIII' # little-endian else: fmt = '>IIIIII' # big-endian

Update values based on the new endianness.

        block_typ, block_len, magic, ts_hi, ts_lo, cap_len = struct.unpack(fmt, cur[i:i+24])
        if block_typ == 0x00000006:
            # Enhanced Packet Block. This is where we get USB data from.
            packet_data = cur[i+28:i+28+cap_len]
            ts = ((ts_hi << 32) + ts_lo) * ts_resolution
            loop_fn(cap_len, packet_data, ts)
        i += block_len

It makes a few reasonable assumptions based on the fact that we only use this parser for usbrply:

  1. It only looks into EnhancedPacket blocks. This is where wireshark puts USB data anyway. The same assumption is in my patch that uses python-pcapng library.
  2. It only works with well-formed pcapng files. If we want to deal with truncated dump files, we would just add a single if line into the loop to avoid crashing.
  3. It assumes that timestamp resolution is 1 ms, which is true for Wireshark + usbmon. But if needed, it will be ~12 lines of code to parse if_tsresol option in Interface Statistics Block.

I don't think that I made any other simplifying assumptions. The code is short, because pcapng format is simple & clean. Plus we only need to comprehend a tiny subset.

Let me know what you think.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JohnDMcMaster/usbrply/issues/9#issuecomment-599165676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABWHW6DWP3CQHP6WE73F5DRHRI3HANCNFSM4IXVIXNQ .

krasin commented 4 years ago

No rush. Also, to make it clear: I don't have any interest in any particular solution being adopted. Getting Python3 properly supported will eliminate the need of a special Ubuntu 16.04 VM that I use for usbrply. That would be nice. But the way we achieve it has zero impact on my usbrply usage.

My line of thinking was to look into python-pcapng and try to understand why is it so slow. Apparently, they made the code a bit too flexible. The following is in the most tight loop:

def read_int(stream, size, signed=False, endianness='='):
    """
    Read (and decode) an integer number from a binary stream.
    :param stream: an object providing a ``read()`` method
    :param size: the size, in bits, of the number to be read.
        Supported sizes are: 8, 16, 32 and 64 bits.
    :param signed: Whether a signed or unsigned number is required.
        Defaults to ``False`` (unsigned int).
    :param endianness: specify the endianness to use to decode the number,
        in the same format used by Python :py:mod:`struct` module.
        Defaults to '=' (native endianness). '!' means "network" endianness
        (big endian), '<' little endian, '>' big endian.
    :return: the read integer number
    """
    fmt = INT_FORMATS.get(size)
    fmt = fmt.lower() if signed else fmt.upper()
    assert endianness in '<>!='
    fmt = endianness + fmt
    size_bytes = size // 8
    data = read_bytes(stream, size_bytes)
    return struct.unpack(fmt, data)[0]

And there is a lot of indirections on the top layers. And this is why is it so slow.

I decided to understand how fast could it be. The code snippet above is the result of this investigation. It was surprisingly straightforward and the code is as fast as we could get in pure Python.

JohnDMcMaster commented 4 years ago

Hmm interesting. Maybe we could ping the python-pcapng devs on the performance issue and see if they have any interest in your findings?

krasin commented 4 years ago

Would you like to try doing that @JohnDMcMaster? Unlike myself, you can speak with authority about usbrply.

I will certainly be available for any questions or even pull requests to python-pcapng.

JohnDMcMaster commented 4 years ago

Sure, I'll reach out, but if they respond I may need some help with specifics since you looked into the technical side of the issue

On Sat, Mar 14, 2020 at 9:54 PM Ivan Krasin notifications@github.com wrote:

Would you like to try doing that @JohnDMcMaster https://github.com/JohnDMcMaster? Unlike myself, you can speak with authority about usbrply.

I will certainly be available for any questions or even pull requests to python-pcapng https://github.com/rshk/python-pcapng/.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JohnDMcMaster/usbrply/issues/9#issuecomment-599167938, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABWHW6MRXUAWZ3SPH7HCY3RHRNOZANCNFSM4IXVIXNQ .

krasin commented 4 years ago

Sounds good @JohnDMcMaster! Thank you for being so pleasant to interact with.

JohnDMcMaster commented 4 years ago

np! Thanks for your PRs!

Other: wonder how much work it would be to port the original usb library to python3? In parallel I might reach out to them

On Sat, Mar 14, 2020 at 9:57 PM Ivan Krasin notifications@github.com wrote:

Sounds good @JohnDMcMaster https://github.com/JohnDMcMaster! Thank you for being so pleasant to interact with.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JohnDMcMaster/usbrply/issues/9#issuecomment-599168118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABWHW3QA3YMOV3IXZGD3F3RHRN2ZANCNFSM4IXVIXNQ .

krasin commented 4 years ago

That's a very good idea, @JohnDMcMaster. I think it would be pretty straightforward to port, but there might be some other things which have rotten since the time they touched the library last time.

For one, the source code is still on CVS and SourceForge no longer allows to modify it. So, migrating to a new repo will probably be needed. Not a big deal, but one of the items in the todo list.

JohnDMcMaster commented 4 years ago

FYI I have some windows captures I need to parse, so in combination with the recent project interest I'm going to make a stab at a real overhaul. In particular I'm going to focus on https://github.com/JohnDMcMaster/usbrply/issues/13 The end result should be that there will be a set of packet source engines you can select, say one for Windows pcap and one for Linux pcap. This should also make it so that we can start merging in support for other pcap parsers (or even say Beagle) if we really want to and let the user choose which they want to use.

Flameeyes commented 4 years ago

JFYI, https://github.com/Flameeyes/usbmon-tools already handles a bit of parsing of this via python-pcapng -- and I'm happy to expand that as needed. I also just dropped the CLA requirements that were there for my previous employer.

JohnDMcMaster commented 4 years ago

Hi Flameeyes! I think usbmon-tools got brought up before but the issues were -1) usbrply parsing was too messy at the time to make looking at other parsing libraries easy (this is fixed now) -2) it wasn't clear what specifically could be leveraged. can you provide more specific information / example? -3) Possibly requires python 3.7? (https://github.com/Flameeyes/usbmon-tools/blob/master/usbmon/tools/chatter_hid.py#L36) Unfortunately I'm still running python 3.5. Not to say this isn't fixable in one way or another, but it makes its less appealing to me

Flameeyes commented 4 years ago

Pretty much usbmon-tools overlap with this library in the parsing of pcapng files — it includes slightly abstracted classes that provide the same interface for Linux and Windows captures, as well. The main advantage would be to have one "common" implementation for those parsing.

As for Python version — I've been targeting 3.7 because that's the one I have around. I'm fairly sure it can run on 3.6 with pretty much no changes, but construct does not officially support 3.5.

JohnDMcMaster commented 4 years ago

@krasin @Flameeyes Please see I just merged in pcapng support. This should make python3 support significantly easier, although I haven't looked at that yet.

Flameeyes commented 4 years ago

For what it's worth I plan on tagging a new (py3-only, >=3.5) release for python-pcapng (version 2.0.0) after I merge write support (https://github.com/rshk/python-pcapng/pull/29, WIP).

The current version (1.0.0) should work with Py2 and Py3, and 2.0.0 shouldn't break API directly, but I'll make sure to test your module before I make a release.

JohnDMcMaster commented 4 years ago

python3 seems to work in d6b3ed6

@Flameeyes got it, thanks for the heads up. I don't think this will be a problem (ie if you drop py2 support)