adoptium / adoptium-support

For end-user problems reported with our binary distributions
Apache License 2.0
46 stars 15 forks source link

PageManager PDF Writer deadlocks PrintServices #71

Closed tresf closed 4 years ago

tresf commented 4 years ago

Steps to reproduce

  1. Install PageManager 9.5 for Windows (Trial version is available)
  2. Reboot
  3. Execute the following code:
    PrintService[] printServices = PrinterJob.lookupPrintServices();
    for(PrintService printService : printServices) {
       if (printService.getName().equals("PageManager PDF Writer")) {
           System.out.println("Found " + printService);
           printService.getDefaultAttributeValue(PrinterResolution.class);
           System.out.println("Found default attribute value"); // <----- NEVER GETS HERE
       }
    }

Platform and architecture:

Workaround

Downstream Bug Report

tresf commented 4 years ago

Running the offending code through a debugger, the hang seems to occur in Win32PrintService.java:getCapabilities which I believe calls some native function inside WPrinterJob.cpp.

tresf commented 4 years ago

Note, I've also reached out directly to NewSoft Technology Corporation, the current owners of the Presto! line so that they're aware of the issue.

aahlenst commented 4 years ago

@tresf Before I spend time on this one: Do you have any news from anyone involved?

tresf commented 4 years ago

No reply from NewSoft, no updates on this issue since original filing.

aahlenst commented 4 years ago

@tresf Thanks for the great analysis (as usual). I tested with an upstream build of OpenJDK 11.0.7 and 14.0.1 and could verify the behavior. Looking at the C++ code and considering that it works well with other printers, I have doubts that the problem is actually in OpenJDK. Blacklisting printers in OpenJDK does not seem to be a sensible idea, either. Most likely, this problem is in PageManager. Do you agree on closing this issue? Otherwise, what's your suggestion?

tresf commented 4 years ago

Our solution downstream was to blacklist it by printer name, but that's only a partial workaround since the default printer name isn't guaranteed to stay that way.

Blacklisting printers in OpenJDK does not seem to be a sensible idea

Agreed.

Most likely, this problem is in PageManager.

Agreed.

Do you agree on closing this issue? Otherwise, what's your suggestion?

If it's decided to close as wont-fix, I'm OK with that, but I don't have enough information to know if that's what I would recommend. For example, if it's a bug in how Java loads all print drivers that's being exposed by Presto!, the issue will eventually come back and thus guarding against this deadlock will help the JDK moving forward.

On the other hand, if the issue is unrelated to Java code, then the bug should go elsewhere (Presto! or Microsoft perhaps?). At time of writing this, Presto! never got back to me (not even to acknowledge receiving it).

Did you dig deep enough to see if the C++ code was stuck somewhere that could be safely escaped? I'm curious where it dies. Even if it's not fixed, referencing the CPP source might help it down the road if it's resurrected, or if Presto! finally decides to correct it.

aahlenst commented 4 years ago

The C++ implementation is https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/a12f60a83fc87bbba2e2d5ade17f6241a8942aac/jdk/src/windows/native/sun/windows/WPrinterJob.cpp#L726. https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-devicecapabilitiesa looks like the corresponding Windows API which handles the details and seems to be synchronous. I do not see what could be done to work around this problem, but this isn‘t my area of expertise.

tresf commented 4 years ago

@aahlenst thanks kindly for linking the code. I haven't run it through a C++ debugger yet (I'm not a C++ developer so if I need to do this, it will take some time) however I noticed by glancing that the API guards for -1 in some areas, but I'm curious if it falls through in other areas?

Quoting Microsoft:

Return value

If the function succeeds, the return value depends on the setting of the fwCapability parameter. A return value of zero (0) generally indicates that, while the function completed successfully, there was some type of failure, such as a capability that is not supported. For more details, see the descriptions for the fwCapability values.

If the function returns -1, this may mean either that the capability is not supported or there was a general function failure.

So an example like this is guarded:

  int cReturned = ::DeviceCapabilities(printerName, printerPort,
                                         dc_id, NULL, NULL);
  RESTORE_CONTROLWORD
  if (cReturned <= 0) { // ######## GUARDED FOR 0 or -1
      JNU_ReleaseStringPlatformChars(env, printer, printerName);
      JNU_ReleaseStringPlatformChars(env, port, printerPort);
      return NULL;
  }

... however some calls don't use <= 0 | > 0 such as copies as well as duplex.

Duplex worries me the most because it doesn't fallback on JNU_ReleaseStringPlatformChars like the other sanitized calls, but instead starts adding DWORD bitwise operators to what could be -1 or 0 according to the API.

I'm sorry for speculative debugging but I don't want to blame Presto! if the issue is an unchecked Win32 API call.

aahlenst commented 4 years ago

I rather would expect that code to crash or to give funky results than to hang. But I have zero experience with Windows drivers and the JNI code around that, so 🤷‍♂️.

I do not have the expertise to help and it does not seem like the others have, either. Considering you have an executable test case and additional information, it might be worth a try to ask on an OpenJDK mailing list like jdk-dev. Maybe someone with expertise is inclined to respond.

tresf commented 4 years ago

I rather would expect that code to crash or to give funky results than to hang. But I have zero experience with Windows drivers and the JNI code around that, so 🤷‍♂️.

My experience with JNI and Windows has been limited as well. In my experience, the crashes occur when symbols or registers are incorrect. Adding insult to injury, I'm not even sure how to run java.exe through a debugger. On Mac and Linux I've used the CLI tools which allow me to see the backtrace, but that still requires access to debug symbols. Assuming no one here is going to do that, I believe the next best way to debug this is to make a standalone executable (such as in Visual Studio) and run the Java 11 code as-is against the driver to catch the hang in the IDE. This is probably a good candidate since it relies on mostly win32 APIs and a few JDK headers.

In regards to reaching out to other channels (like jdk-dev) or staying here, I think it depends on a few factors:

I'd like to clarify that symptomatically, it's 100% a support problem. The end-user can't use the JDK if this driver is installed and that's going to punish the end-users and IT administrators trying to use these two products together. Fortunately, I've only run into this combination of Presto! and Java this one time, so perhaps someone with an invested interest in both products can push this through the correct channels at a future date. 🍻