enthought / comtypes

A pure Python, lightweight COM client and server framework, based on the ctypes Python FFI package.
Other
291 stars 96 forks source link

Returning `out` arguments breaks when they must be preallocated #474

Closed jonschz closed 2 months ago

jonschz commented 1 year ago

The standard treatment of argument directions is as follows:

This model breaks down when a parameter marked as 'out' must be preallocated with some defined amount of memory. This happens for example in ISequentialStream::Read (or RemoteRead in my case, which has the same signature). The parameters are given by

        (['out'], POINTER(c_ubyte), 'pv'),
        (['in'], c_ulong, 'cb'),
        (['out'], POINTER(c_ulong), 'pcbRead')

This call only works if pv is pre-allocated with cb bytes, which cannot be done in the current model.

I resorted to calling the base method ISequentialStream._ISequentialStream__com_RemoteRead with three arguments. My question is whether I am overlooking a more elegant solution. If that is not the case, I was wondering if there is a better way to treat this problem.

One other option is to change the generated header and set the first parameter to be ['in', 'out'], but this breaks if one re-generates the file and is also technically incorrect.

junkmd commented 1 year ago

Please write a short code snippets to reproduce the condition.

jonschz commented 1 year ago

I quickly put together a file based on your test you provided here:

https://github.com/jonschz/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L116-L121

Here I use the workaround described above:

https://github.com/jonschz/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L122-L127

As before, a portable device must be connected for the test to work. This test won't modify anything on the device, but it may download personal data to the host computer.

(not sure how one can get the code snippets to render here)

junkmd commented 1 year ago

If you change the USERNAME of https://github.com/{USERNAME}/comtypes/blob/... to the repository host, it will be rendered.

https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L116-L121 renders …

https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L116-L121

And https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L122-L127 renders …

https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L122-L127

junkmd commented 1 year ago

The MemberSpec for the GetStream method of IPortableDeviceResources is made as follows.

# in `...\comtypes\gen\_1F001332_1A57_4934_BE31_AFFC99F4EE0A_0_1_0.py`
IPortableDeviceResources._methods_ = [
    ...
    COMMETHOD(
        [],
        HRESULT,
        'GetStream',
        (['in'], WSTRING, 'pszObjectID'),
        (['in'], POINTER(_tagpropertykey), 'key'),
        (['in'], c_ulong, 'dwMode'),
        (['in', 'out'], POINTER(c_ulong), 'pdwOptimalBufferSize'),
        (['out'], POINTER(POINTER(IStream)), 'ppStream')
    ),
    ...
]

Then, I have a few questions about some of your code snippets.

https://github.com/enthought/comtypes/blob/967b397981a0a0fc71bfc45430ca7e73c8b05de4/comtypes/test/test_portabledevice.py#L106-L114

jonschz commented 1 year ago

Why do you overwrite a null pointer IStream with the return value of resources.GetStream immediately after assigning it to pFileStream?(L106-L107) Why are you taking the pointer contents from the pFileStream?(L114)

Both of those are relics from someone else's earlier code that I forgot to remove, my bad. The updated code is here, and the behaviour has not changed.

junkmd commented 1 year ago

You should try to redefine ISequencialStream and ISequence on test_portabledevice like this. (import GUID, c_longlong, etc. as appropriate)

class ISequentialStream(comtypes.IUnknown):

    _case_insensitive_ = True
    _iid_ = GUID('{0C733A30-2A1C-11CE-ADE5-00AA0044773D}')
    _idlflags_ = []

    def RemoteRead(self, size: int) -> tuple["Array[c_ubyte]", int]:
        pv = (c_ubyte * size)()
        pcb_read = pointer(c_ulong(0))
        self.__com_RemoteRead(pv, c_ulong(size), pcb_read)
        return pv, pcb_read.contents.value  # or `list(pv), pcb_read.contents.value`, or `bytes(pv), pcb_read.contents.value`

ISequentialStream._methods_ = [
    COMMETHOD(
        [],
        HRESULT,
        'RemoteRead',
        (['out'], POINTER(c_ubyte), 'pv'),
        (['in'], c_ulong, 'cb'),
        (['out'], POINTER(c_ulong), 'pcbRead')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'RemoteWrite',
        (['in'], POINTER(c_ubyte), 'pv'),
        (['in'], c_ulong, 'cb'),
        (['out'], POINTER(c_ulong), 'pcbWritten')
    ),
]

_LARGE_INTEGER = c_longlong
_ULARGE_INTEGER = c_ulonglong

class tagSTATSTG(Structure):
    pass

class IStream(ISequentialStream):

    _case_insensitive_ = True
    _iid_ = GUID('{0000000C-0000-0000-C000-000000000046}')
    _idlflags_ = []

IStream._methods_ = [
    COMMETHOD(
        [],
        HRESULT,
        'RemoteSeek',
        (['in'], _LARGE_INTEGER, 'dlibMove'),
        (['in'], c_ulong, 'dwOrigin'),
        (['out'], POINTER(_ULARGE_INTEGER), 'plibNewPosition')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'SetSize',
        (['in'], _ULARGE_INTEGER, 'libNewSize')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'RemoteCopyTo',
        (['in'], POINTER(IStream), 'pstm'),
        (['in'], _ULARGE_INTEGER, 'cb'),
        (['out'], POINTER(_ULARGE_INTEGER), 'pcbRead'),
        (['out'], POINTER(_ULARGE_INTEGER), 'pcbWritten')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'Commit',
        (['in'], c_ulong, 'grfCommitFlags')
    ),
    COMMETHOD([], HRESULT, 'Revert'),
    COMMETHOD(
        [],
        HRESULT,
        'LockRegion',
        (['in'], _ULARGE_INTEGER, 'libOffset'),
        (['in'], _ULARGE_INTEGER, 'cb'),
        (['in'], c_ulong, 'dwLockType')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'UnlockRegion',
        (['in'], _ULARGE_INTEGER, 'libOffset'),
        (['in'], _ULARGE_INTEGER, 'cb'),
        (['in'], c_ulong, 'dwLockType')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'Stat',
        (['out'], POINTER(tagSTATSTG), 'pstatstg'),
        (['in'], c_ulong, 'grfStatFlag')
    ),
    COMMETHOD(
        [],
        HRESULT,
        'Clone',
        (['out'], POINTER(POINTER(IStream)), 'ppstm')
    ),
]

Then rewrite as follows;

https://github.com/enthought/comtypes/blob/1c8a9bd0564507d882bda52f07c60d56306ba08c/comtypes/test/test_portabledevice.py#L106-L112


                optimalTransferSizeBytes, fileStream = resources.GetStream(
                    objectID,
                    WPD_RESOURCE_DEFAULT,
                    STGM_READ,
                    optimalTransferSizeBytes,
                )
                fileStream = fileStream.QueryInterface(IStream)
                blockSize = optimalTransferSizeBytes.contents.value

https://github.com/enthought/comtypes/blob/1c8a9bd0564507d882bda52f07c60d56306ba08c/comtypes/test/test_portabledevice.py#L122-L125

                    _, data_read = fileStream.RemoteRead(blockSize)
                    print(f"Read data: {data_read}")

This way is WET and may not be very elegant because it redefines the classes. However, it can be avoid from the ugly situation of directly referencing a mangled private name.

edited: fix return value of RemoteRead

junkmd commented 1 year ago

If we were going to fix these problems permanently(in other words, "fix as no need workarounds defined by users"), we need the agreements with the community.

The followings are my shallow thoughts.

junkmd commented 1 year ago
  • I assumed that the case if "generally interfaces" such as IStream is defined statically. If that happens, the codegenerator will stop to define dynamically them, instead import them(see known_symbols) from statically modules. (This may also break backward compatibilities, but may be better than the aforementioned)

With regard to the aforementioned, I have done a little experimentation that are adding the module with interfaces statically defined and they are imported instead of being defined dynamically.

https://github.com/enthought/comtypes/blob/cfb8ba1c15efcfd352806c671d4faa1f0b879a7b/comtypes/objidl.py#L1-L140

https://github.com/enthought/comtypes/blob/cfb8ba1c15efcfd352806c671d4faa1f0b879a7b/comtypes/client/_generate.py#L249-L260

However, when it comes to introducing such these into production code (as I have said many times), we must be very careful about backward compatibility.

jonschz commented 1 year ago

I like the idea of statically importing these streams.

Edit: I was wrong, see the comment below ~Regarding your RemoteRead:~

def RemoteRead(self, size: int) -> tuple["Array[c_ubyte]", int]:
       pv = (c_ubyte * size)()
       pcb_read = pointer(c_ulong(0))
       self.__com_RemoteRead(pv, c_ulong(size), pcb_read)
       return pv, pcb_read.contents.value  # or `list(pv), pcb_read.contents.value`, or `bytes(pv), pcb_read.contents.value`

Regarding RemoteWrite: We don't technically need this one since it does not have the same problem as RemoteRead, but it is still convenient. I am more worried about backwards compatibility here since there might be code out there using the generated RemoteWrite, while all working code out there must have a workaround for RemoteRead in one way or another.

   def RemoteWrite(self, pv: bytes, cb: int) -> int:
        pv_ = (c_ubyte * len(pv)).from_buffer(bytearray(pv))
        pcb_written = pointer(c_ulong(0))
        self.__com_RemoteWrite(pv_, c_ulong(cb), pcb_written)  # type: ignore
        return pcb_written.contents.value

This works as expected. The code from https://github.com/KasparNagu/PortableDevices uses ctypes.create_string_buffer() and ctypes.cast() to cast it to a ubyte-array, but I think I prefer your solution. We could also contemplate removing cb as a parameter and just go with len(pv). If a user only wants to write part of some buffer, they can simply slice it.

junkmd commented 1 year ago

Thanks a lot!

After posting the sample code snippet, I still gave it some thought.

To test this RemoteRead, what is the suitable way? Your code is using the portabledevice api. But there might be other com libraries that handle files in IStream. I am not familiar with such libraries, so I am hoping you or other community members can suggest a way that is not environment dependent.

Edited: Fixed the part where I meant to RemoteWrite but I had written RemoteRead.

jonschz commented 1 year ago

Hi,

I have put together a unit test for IStream, see here. While I could not easily find a different importable library that uses IStream, CreateStreamOnHGlobal works just fine with the one from portable devices.

Also, my apologies - your suggested code for RemoteRead works just fine, I made a mistake in applying it 🤦‍♂️

Your suggestions sound reasonable, though I suspect that other interfaces which implement Read instead of RemoteRead may run into the same issue. I am not sure which conditions distinguish the use of RemoteRead vs. Read.

As far as I am concerned, statically importing ISequentialStream with a fixed RemoteRead would work for me.

junkmd commented 1 year ago

Your contribution helps us so much.

I found the SO that might help you about Read and RemoteRead.

https://stackoverflow.com/questions/19820999/what-are-the-remoteread-and-remotewrite-members-of-isequentialstream

Your code is a great use case for CreateStreamOnHGlobal in comtypes.

I am trying to figure out how to take this into the production code so that we can continue to operate well. Please wait a while.

Any opinions would be appreciated.

junkmd commented 1 year ago

As you say, the only thing that needs to be statically defined in the scope of this issue is ISequentialStream. In addition to overriding RemoteRead, adding a type-checking-only type hint to RemoteWrite would be sufficient for modern Python code.

Is this code okay for you? If you agree, I would like to hear from other maintainers on this code.

Also keep in mind, the drop_py2 has already many changes at this time, so it may likely be after the drop_py2 plan when the PR you would submit is merged.

jonschz commented 1 year ago

The code looks fine to me, thank you! I can also confirm the correct behaviour of the test (passes with the modification in _generate, fails without it).

so it may likely be after the drop_py2 plan when the PR you would submit is merged.

In that case, would it make more sense to create the pull request at a later point in time?

junkmd commented 1 year ago

@vasily-v-ryabov

Do you think these changes are reasonable to resolve this issue?

Even if these changes are acceptable for the project, I think it is out of scope of the drop_py2, such as #475.

How do you think this should go?

junkmd commented 4 months ago

@jonschz

Hi,

The codegenerator was determining whether the item is one of the known symbols based solely on its name.

However, with #529 and #534, the criteria for determining whether a COM interface is a known symbol now include not only the name but also the iid.

With those change, I think that ISequentialStream can be included in the __known_symbols__ of objidl.py without the risk of it being mistaken for another interface with the same name.

If you are still interested in this issue, feel free to PR.