Closed mkrmasch closed 2 years ago
@mkrmasch What version of DNNE are you using? This was discovered recently and fixed in https://github.com/AaronRobinsonMSFT/DNNE/pull/112 (1.0.30).
Actually we are using 1.0.30 and found this issue
@mkrmasch Well that is unfortunate. Can you please share the platform details you are using (OS, CPU) and the stacktrace captured in the debugger?
Does anyone have an idea if it would be possible to avoid that in the unmanaged layer already?
In the meantime the CLR can be preloaded manually.
Hey, actually it is hard to provide a useful stack trace the app hangs inside the create call and if I break in the debugger I do not end up with
I tryed to use DNNE_CALLTYPE try_preload_runtime() but this is kind of instable and causes the app to hang if I do parallel calls
@mkrmasch I am going to assume that mgcore_SH_19_0.dll
is the binary generated by DNNE. However, I don't believe there is any call to those APIs in DNNE generated code. This is an odd issue if that is the stack being observed. The PDB for the generated binary should be copied to the output so can you confirm version 1.0.30 is being used? Perhaps this is an assert
and that is how it is implemented on Win32 - that would be surprising to me though.
I will try to reproduce this locally today.
@mkrmasch Yep, I found it. Thanks for reporting this. I will put out a new package shortly.
@mkrmasch Package 1.0.31 has been published https://www.nuget.org/packages/DNNE/1.0.31. Thank you for reporting this issue.
Hey @AaronRobinsonMSFT , that worked, thank you so much!
Havin an application that does parallel calls into the DNNE/.net unmanage/mangaged stack, I sometimes face some strange behavior of a stalled app that does not return from an initial call into the C-API. From the debugger I can at least see that the problem exists outside the managed code. I can also workaround this issue if I avoid a parallel call at the first time which seems to indicate that the problem is kind of a race condition when loading the dll into memory and having concurrent calls at this moment.
Does anyone have an idea if it would be possible to avoid that in the unmanaged layer already?