cognidox / OfficeToPDF

A command line tool to convert Microsoft Office documents to PDFs
https://www.cognidox.com/
Other
609 stars 137 forks source link

Feature/watchdog thread #80

Open asparrowhawk opened 2 years ago

asparrowhawk commented 2 years ago

Watchdog

This pull request contains a significant number of changes. Notable are the addition of a native assembly which contains COM interop code that is responsible for retrieving the process id of a COM server given a IUnknown interface pointer. This interface is typically the Office application that is being automated. A separate README.md in the COMServer project directory provides additional information and insight into how it is implemented and used.

With the functionality delivered by the COMServer assembly, a new /timeout <seconds> command line option was added that informed how long the OfficeToPDF application should wait for the conversion process to happen. This involved the addition of the following new classes and interfaces:

Other changes are the addition of the ArgParser class. This is a class that is derived from the Systems.Collections.Hashtable that was employed by the original code. The new class pulls together all the related command line parsing logic and state. It also adds type safe access to the contents of the hash table removing the need for the client code to perform a cast to the expected type at point of access.

For example:

Boolean running = (Boolean)options["noquit"];

becomes

Boolean running = options.noquit;

The code changes also introduce the IConverter interface and ConverterFactory class. These provide a uniform function for conversion and a simple way to create the required converter based on the source filename's extension.

All of the Converter class implementations remain as per the original master branch. Any changes were limited to using the ArgParser class instead of the Hashtable.

Also added were a number of NUnit based tests that ensure the correct behaviour of a lot of the new classes. The test project also contains a number of 'Explicit' tests that can be used to verify the behaviour of the Watchdog and that the COMServer code retrieves the process id.

There is also a GitHub actions workflow that builds the source code and runs the unit tests that are NOT marked as Explicit. See the .github\workflows\build.yml that is part of the solution.

With the introduction of the COMServer assembly the projects must be built as either x86 or by default x64. Therefore the "Mixed Platforms" and "Any CPU" build configurations have been removed from the projects and the solution.

For the Unit tests, some custom MSBuild configuration, copies the required native assembly to the output directory:

  <Target Name="CopyDependents" AfterTargets="Build">
    <ItemGroup>
      <DependentFile Include="COMServer.dll;COMServer.pdb" />
    </ItemGroup>
    <PropertyGroup Condition="'$(Platform)' == 'x86'">
      <DependentDir>$(Configuration)</DependentDir>
    </PropertyGroup>
    <PropertyGroup Condition="'$(Platform)' != 'x86'">
      <DependentDir>$(Platform)\$(Configuration)</DependentDir>
    </PropertyGroup>
    <Copy SourceFiles="@(DependentFile -> '$(SolutionDir)COMServer\$(DependentDir)\%(Identity)')" DestinationFolder="$(TargetDir)" SkipUnchangedFiles="true" Condition="'$(NCrunch)' != '1'" />
  </Target>

The projects have been updated to use .NET Framework version 4.8. All work was carried out using Visual Studio 2022. The existing NuGet packages were NOT updated. The Unit test project references the latest NUnit NuGet packages.

Deployment still just requires the OfficeToPdf.exe and OfficeToPdf.exe.config files. The COMServer.dll has been added to OfficeToPdf project as embedded resources as detailed in the Costura documentation.

asparrowhawk commented 2 years ago

I am having problems with GitHub actions. The build agents do not seem to have the Microsoft.mshtml.dll assembly registered on them. This is required in order to build the COMServer and OfficeToPdf projects due to their reliance COM and the Office Primary Interop Assemblies.

C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Current\Bin\Microsoft.Common.CurrentVersion.targets(2926,5): warning MSB3284: Cannot get the file path for type library "0002e157-0000-0000-c000-000000000046" version 5.3. Library not registered. (Exception from HRESULT: 0x8002801D (TYPE_E_LIBNOTREGISTERED)) [D:\a\OfficeToPDF\OfficeToPDF\OfficeToPDF\OfficeToPDF.csproj]
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Current\Bin\Microsoft.Common.CurrentVersion.targets(2926,5): warning MSB3283: Cannot find wrapper assembly for type library "MSHTML". Verify that (1) the COM component is registered correctly and (2) your target platform is the same as the bitness of the COM component. For example, if the COM component is 32-bit, your target platform must not be 64-bit. [D:\a\OfficeToPDF\OfficeToPDF\OfficeToPDF\OfficeToPDF.csproj]

I have tried registering the Microsoft.mshtml.dll assembly as part of the workflow, but that needs admin permissions on the build agent.

I will have to look at other work arounds and the possibility of using a docker container to build the code. But this may take a little bit of time.

asparrowhawk commented 2 years ago

I have solved the build issue by removing the COM references to MSHTML and VBIDE from the OfficeToPdf project.