enthought / comtypes

A pure Python, lightweight COM client and server framework, based on the ctypes Python FFI package.
Other
282 stars 96 forks source link

'SyntaxError: invalid syntax' occurs during MS Project module generation with GetModule function #524

Closed kateryna-ruzhytska closed 2 months ago

kateryna-ruzhytska commented 3 months ago

MS_PROJECT_HASH = '{A7107640-94DF-1068-855E-00DD01075445}' com.GetModule((MS_PROJECT_HASH, 4, 0)) returns 'SyntaxError: invalid syntax' starting from 1.4.0 version:

File "C:\Python39\lib\site-packages\comtypes\gen\MSHTML.py", line 4 from comtypes.gen._3050F1C5_98B5_11CF_BB82_00AA00BDCE0B_0_4_0 import ^ SyntaxError: invalid syntax

comtypes_syntax_error_1 4 0

Current issue is reproducible periodically. Sometimes it happens to import appropriate classes but is it expected to have the second import?

comtypes_1 4 0

junkmd commented 3 months ago

Sometimes it happens to import appropriate classes but is it expected to have the second import?

Yes, indeed, it is expected behavior that classes are imported in "the second import".

junkmd commented 3 months ago

This is the same issue that's happening with #517. It's possible that it has been latent since comtypes==1.3.0. I thought this was a partial file issue like #114, but it seems to be different.

What's puzzling is the situation in which this problem occurs. As you say, it happens occasionally and sometimes it doesn't.

Please share the codebase of _3050F1C5_98B5_11CF_BB82_00AA00BDCE0B_0_4_0.py (hereafter, the wrapper module) when the codebase of MSHTML.py (hereafter, the friendly module) causes a SyntaxError.

When this problem is reproduced, the approach that can be taken should differ depending on which of the following is happening:

The import part of the friendly module generated by CodeGenerator is made from a set of names defined in the wrapper module. https://github.com/enthought/comtypes/blob/7fa88e132f0b0aa3165db1a243e7ee623771ac90/comtypes/tools/codegenerator.py#L622-L640 If there is something that makes the length of this set zero, problematic codebase will be generated. However, at the moment, nothing comes to mind that would cause the length of this set to be zero.

Help from the community is also welcome.

kateryna-ruzhytska commented 3 months ago

Can you please try to reproduce this issue by the steps below.

Precondition: comtypes version is 1.4.0

Steps to reproduce:

  1. Remove folder \Lib\site-packages\comtypes\gen
  2. Run command in Command Prompt: python -c "import comtypes.client as com; com.GetModule('C:\Windows\System32\mshtml.tlb')"

NOTE: you may need to perform the steps above several times to reproduce.

Actual result:

Traceback (most recent call last): File "", line 1, in File "C:\Users\Kateryna.pyenv\pyenv-win\versions\3.9.4\lib\site-packages\comtypes\client_generate.py", line 124, in GetModule return ModuleGenerator().generate(tlib, pathname) File "C:\Users\Kateryna.pyenv\pyenv-win\versions\3.9.4\lib\site-packages\comtypes\client_generate.py", line 203, in generate return self._create_friendly_module(tlib, modulename) File "C:\Users\Kateryna.pyenv\pyenv-win\versions\3.9.4\lib\site-packages\comtypes\client_generate.py", line 222, in _create_friendly_module return _create_module_in_file(modulename, code) File "C:\Users\Kateryna.pyenv\pyenv-win\versions\3.9.4\lib\site-packages\comtypes\client_generate.py", line 172, in _create_module_in_file return _my_import(modulename) File "C:\Users\Kateryna.pyenv\pyenv-win\versions\3.9.4\lib\site-packages\comtypes\client_generate.py", line 27, in _my_import return importlib.import_module(fullname) File "C:\Users\Kateryna.pyenv\pyenv-win\versions\3.9.4\lib\importlib__init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 680, in _load_unlocked File "", line 790, in exec_module File "", line 228, in _call_with_frames_removed File "C:\Users\Kateryna.pyenv\pyenv-win\versions\3.9.4\lib\site-packages\comtypes\gen\MSHTML.py", line 2882, in class _htmlInput(IntFlag): File "C:\Users\Kateryna.pyenv\pyenv-win\versions\3.9.4\lib\enum.py", line 264, in new__ enum_member = new(enum_class, *args) TypeError: int() argument must be a string, a bytes-like object or a number, not '_coclass_meta'

junkmd commented 3 months ago

2. python -c "import comtypes.client as com; com.GetModule('C:\Windows\System32\mshtml.tlb')"

class _htmlInput(IntFlag): File "C:\Users\Kateryna.pyenv\pyenv-win\versions\3.9.4\lib\enum.py", line 264, in new enum_member = new(enum_class, *args) TypeError: int() argument must be a string, a bytes-like object or a number, not '_coclass_meta'

I encountered the same error in my environment. I deleted .../comtypes/gen/... and run the command several times again, but the error always occurred.

Upon checking the source code and going to the definition, it seems that the cause is that the name htmlInputImage exists on the COM type library as "values for enumeration '_htmlInput'" and also as CoClass. (I don't know why an error occurs when you pass mshtml.tlb to generate MSHTML.py, and sometimes there is no error when (MS_PROJECT_HASH, 4, 0) is passed to generate MSHTML.py as a side effect in your environment)

Such COM type libraries were overlooked when implementing this feature.

I think the way to deal with this kind of problem is to assign numerical literals directly, rather than referring to the values of enumeration members from __wrapper_module__.

junkmd commented 3 months ago

I have made changes to numerical literals to be members of the enumeration. This should prevent raising the TypeError. https://github.com/junkmd/comtypes/tree/assign_numerical_literals_to_enum_members

Please install it in your environment and give it a try.

junkmd commented 3 months ago

From https://github.com/enthought/comtypes/issues/524#issuecomment-2045178688,

Please share the codebase of _3050F1C5_98B5_11CF_BB82_00AA00BDCE0B_0_4_0.py (hereafter, the wrapper module) when the codebase of MSHTML.py (hereafter, the friendly module) causes a SyntaxError.

I am also waiting for information on this matter.

kateryna-ruzhytska commented 3 months ago

Attached the _3050F1C5_98B5_11CF_BB82_00AA00BDCE0B_0_4_0.py when SyntaxError occurs (converted it to txt to be able to link it) _3050F1C5_98B5_11CF_BB82_00AA00BDCE0B_0_4_0.txt

junkmd commented 3 months ago

I've looked at the code generated in your environment, and it seems that there are no elements in the codebase of the wrapper module itself that would cause a SyntaxError. If you do from comtypes.gen import _3050F1C5_98B5_11CF_BB82_00AA00BDCE0B_0_4_0 in your environment, it probably won't cause an error. Therefore, I'm guessing that the problem is not with the generation of the wrapper module, but with the generation of the friendly module.

I have made changes to numerical literals to be members of the enumeration. This should prevent raising the TypeError. https://github.com/junkmd/comtypes/tree/assign_numerical_literals_to_enum_members

  • pip install https://github.com/junkmd/comtypes/archive/refs/heads/assign_numerical_literals_to_enum_members.zip

Please install it in your environment and give it a try.

Please share the results about this as well.

Also, when sharing such a codebase, it would be helpful if you could upload it to your public repository and let us know the permalink, because attached files can't be checked immediately and GitHub's syntax highlight can't be used.

junkmd commented 3 months ago

There has been a comment about mshtml.tlb in this project since its inception.

https://github.com/enthought/comtypes/blob/7fa88e132f0b0aa3165db1a243e7ee623771ac90/comtypes/tools/tlbparser.py#L721-L723

In your case, it’s not that the tlbparser.py is stuck in infinite loops, but rather, the error occurs at code generation after parsing typelibs.

Therefore, it seems that the compatibility between mshtml.tlb and comtypes has improved since the time this comment was committed (although it is not certain what the cause is).

kateryna-ruzhytska commented 3 months ago

I have made changes to numerical literals to be members of the enumeration. This should prevent raising the TypeError. https://github.com/junkmd/comtypes/tree/assign_numerical_literals_to_enum_members

  • pip install https://github.com/junkmd/comtypes/archive/refs/heads/assign_numerical_literals_to_enum_members.zip

Please install it in your environment and give it a try.

Those changes helps.

junkmd commented 3 months ago

The cause of this SyntaxError problem could be that the wrapper module already exists, but the friendly module does not.

https://github.com/enthought/comtypes/blob/7fa88e132f0b0aa3165db1a243e7ee623771ac90/comtypes/client/_generate.py#L193-L203

https://github.com/enthought/comtypes/blob/7fa88e132f0b0aa3165db1a243e7ee623771ac90/comtypes/client/_generate.py#L205-L222

https://github.com/enthought/comtypes/blob/7fa88e132f0b0aa3165db1a243e7ee623771ac90/comtypes/client/_generate.py#L224-L246

ModuleGenerator does not generate a wrapper module with CodeGenerator if the wrapper module already exists. Friendly modules are generated using the names of each object defined when generating the wrapper module. In other words, if the step to generate the wrapper module is not taken, the friendly module will become partial.

I think this is similar to issue #114 in terms of the event and cause. If Python terminates after the wrapper module is generated by calling comtypes.client.GetModule, and before the friendly module is generated, this error can occur the next time comtypes.client.GetModule is called.

By determining if the wrapper module already exists and the friendly module does not, and displaying an appropriate error message, we can help users understand what to do next to resolve the error.

Also, by not generating the wrapper module file until the friendly module file is generated, and making them both files at the same time when they are ready, I think we can reduce the occurrence of this error.

junkmd commented 3 months ago

Those changes helps.

Thank you.

I’m planning to apply a patch to at least resolve the TypeError and release it as 1.4.1.

I would like a little more time to respond to the SyntaxError.

kateryna-ruzhytska commented 3 months ago

The cause of this SyntaxError problem could be that the wrapper module already exists, but the friendly module does not.

I've got both modules present when this issue occurs. Also, I uploaded both files to the public repository kateryna-ruzhytska/comtypes_syntax_error_1_4_0

junkmd commented 3 months ago

Please keep the _3050F1C5_98B5_11CF_BB82_00AA00BDCE0B_0_4_0.py intact, delete only the MSHTML.py, and then try calling comtypes.client.GetModule.

Let us know the result.

kateryna-ruzhytska commented 3 months ago

Please keep the _3050F1C5_98B5_11CF_BB82_00AA00BDCE0B_0_4_0.py intact, delete only the MSHTML.py, and then try calling comtypes.client.GetModule.

Let us know the result.

In this case we still get the error and MSHTML.py is generated the same way again.

junkmd commented 3 months ago

Please keep the _3050F1C5_98B5_11CF_BB82_00AA00BDCE0B_0_4_0.py intact, delete only the MSHTML.py, and then try calling comtypes.client.GetModule.

Let us know the result.

In this case we still get the error and MSHTML.py is generated the same way again.

The same thing happened in my environment.

junkmd commented 3 months ago

I noticed that the condition for whether the ModuleGenerator imports an existing module or creates a new one, and the order of operations for (re)generating a module, are not appropriate.

This is the same kind of thing that I discussed in https://github.com/enthought/comtypes/pull/116#issuecomment-1159620446.

Recreating both the wrapper module and the friendly module unless both modules already exist might make it less producing such partial files.

In the current implementation, the wrapper module file is generated, then the codebase for the friendly module is generated, and then the friendly module file is generated. If Python crashes while generating the codebase for the friendly module, only the wrapper module will exist. After that, if we call comtypes.client.GetModule, only a partial friendly module will be created.

By generating both the codebase for the wrapper module and the codebase for the friendly module, and then generating the module files for both, the time between generating the two files can be reduced, which could increase stability.

junkmd commented 3 months ago

@kateryna-ruzhytska

Based on the considerations I made in https://github.com/enthought/comtypes/issues/524#issuecomment-2049546592, I changed the implementation of ModuleGenerator.

Please install this in your environment, try the following, and let us know the result:

Thank you.

junkmd commented 3 months ago

@kateryna-ruzhytska

In addition to the branch mentioned earlier, I would like you to pip install https://github.com/junkmd/comtypes/archive/refs/heads/fix_issue_524_syntaxerror_and_more.zip and test in your environment.

This branch is the one that has merged the contents of #527, #528, and #529 into https://github.com/junkmd/comtypes/archive/refs/heads/fix_issue_524_syntaxerror.zip.

junkmd commented 2 months ago

I executed the contents of fix_issue_524_syntaxerror_and_more in my environment.

I confirmed that even if either the wrapper module or the friendly module file existed, both files and modules would be regenerated.

Testing this requires diving into Python's import system, which is difficult.

However, even with this implementation, all the existing unit tests were able to run and they all passed. Afterwards, I deleted some of the .py files under comtypes.gen and ran the tests again. Deleted .py files were regenerated and all tests passed. And, when both the friendly module file and the wrapper module file existed, they were not regenerated. Therefore, I recognize it as fully functional.

In early May, I plan to merge this content and release it as comtypes==1.4.2. If there are opinions from the community by then, there is a possibility that the plan may change.

Any opinions would be appreciated.