Closed tknopp closed 10 years ago
Is it a calling convention problem? I know with ODBC and other dlls that get tight with windows internals, the calling convention has to be specified stdcall
, right after the (:function, lib),
tuple. E.g. from ODBC
function SQLFetchScroll(stmt::Ptr{Void},fetch_orientation::Int16,fetch_offset::Int)
@windows_only ret = ccall( (:SQLFetchScroll, odbc_dm), stdcall,
Int16, (Ptr{Void},Int16,Int),
stmt,fetch_orientation,fetch_offset)
@unix_only ret = ccall( (:SQLFetchScroll, odbc_dm),
Int16, (Ptr{Void},Int16,Int),
stmt,fetch_orientation,fetch_offset)
return ret
end
I don't think so. The calling convention that has to be used is "stdcall" and I use that in the ccall. Furthermore in the "self-written" DLL, which is actually a released product, the NI DLL is lazily loaded and there is also stdcall calling convention been used.
Unfortunately this is a hard to debug problem as the NI DLL is closed source. I have attached to a debug build of the "self-written" DLL using Visual Studio and can step until the call to the NI function, which then hangs deeply in some MS API. Again, if I do the same from a C there is no issue.
Did you try compiling your self-built DLL with MinGW? Also, there are a number of postings on the NI list about using with MinGW, which may apply to Julia as well (eg how to extract libs).
I've also seen mention of two types of error handling for NIDAQMX - "simple" and "general". I didn't find a description of what these actually mean, but this could be causing or at least masking the problem. For example, if the NI code is throwing a C++ exception that is not supported by the runtime. [edit: although one would assume that the C interface does not throw C++ exceptions, it could still be written in C++ underneath and conflict with the runtime]
Unfortunatly building the self-build DLL with MinGW is not possible as this is a larger project.
But maybe this is really an issue with different c++ stdlibs. But this would indicate a larger problem on windows as I would assume that MSVC/Intel build DLLs are the standard on windows and not the exception. Maybe its necessary to compile julia with MSVC then.
Well, how about a very simple stub C code that just makes a few calls like the one that hangs from Julia, and try compiling that with MinGW. If you can get that to work (perhaps using the extraction technique I linked) then it may help to isolate the issue.
Yes that seems to be the best idea. Will install MinGW and try that. A further check is to compile a MSVC dll with a stdcall convention and look if this fails. The issue seem to be that the stdcall convention of MinGW and MSCV are not(!) compatible.
I am not sure if this is the same issue but the following code, which relies only on kernel32.dll crashes:
buf = zeros(Uint8,1024); ccall( (:GetComputerNameA,"Kernel32"), stdcall, Int32, (Ptr{Uint8}, Int32), buf, 1024)
Looking at http://llvm.org/docs/doxygen/html/namespacellvm_1_1CallingConv.html#a4f861731fc6dbfdccc05af5968d98974 there seem to be a flag X86_64_Win64. Might it be that this needs to be used in ccall
for 64 windows?
cc @vtjnash
Win64 only has one calling convention
Type sig for your last example is wrong -- second parameter needs to Ptr{Int} not Int32
Will look more tonight
Sorry for that, using the correct second parameter works. So it seems to be no general issue with non-mingw libs.
@tknopp do you have access to a recent 0.3 build? I'm thinking that https://github.com/JuliaLang/julia/commit/524e305872f4a0946a283bc75308c13462432285 (and followup commits to debuginfo.cpp) may have fixed this. Also, can you post the ccall line you are using?
Is there some nightly build of Julia? I habe not yet setup mingw.
I use the ccall
ccall( (:DAQmxResetDevice, "C:\\Windows\\System32\\nicaiu.dll"), stdcall, Int32, (Ptr{Uint8},),"Dev3")
The call hangs no matter which calling convention I use.
When I call DAQmxResetDevice
indirectly I have a library "MyLib.dll" where I have a funtion doMeasurement
that lazily loads DAQmxResetDevice
using GetProcAddress
. When I attach to "julia-readline" doMeasurement
is called fine but when invoking the (valid) function pointer of DAQmxResetDevice
the call hangs. When I break I get the following call stack:
ntdll.dll!00000000779d15fa()
[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]
KernelBase.dll!000007fefdb11203()
nipalu.dll!0000000006c930b2()
nipalu.dll!0000000006c92d41()
nidmxfu.dll!0000000011668850()
nidmxfu.dll!0000000011668258()
nidmxfu.dll!0000000011668dfb()
nidmxfu.dll!00000000117a6327()
nidmxfu.dll!000000001178a837()
nidmxfu.dll!00000000117bf17e()
nidmxfu.dll!000000001178408e()
nidmxfu.dll!000000001178453c()
nidmxfu.dll!0000000011755445()
nidmxfu.dll!0000000011752945()
nidmxfu.dll!0000000011689e1a()
nidmxfu.dll!00000000118f65ca()
nicaiu.dll!00000001802648ac()
So he is at least in the right dll. I really cannot see why this has anything to do with Julia but doing it from C or Python (using ctypes) works without issues.
I have tried @ihnorton suggestion and compiled a small c++ program with MinGW and this works fine:
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
typedef int (__stdcall* ResetDevice_t)(const char[]);
int main()
{
HMODULE dll_module = LoadLibraryA("nicaiu.dll");
ResetDevice_t _ResetDevice = reinterpret_cast<ResetDevice_t>(GetProcAddress(dll_module, "DAQmxResetDevice"));
int result = (*_ResetDevice)("Dev3");
}
@staticfloat did you have a bleeding edge version that can be uploaded to julialang?
@JeffBezanson are you using 0.2 for IAP? I can back-port this fix to the release branch if so.
I am currently trying to build Julia Master with MinGW using the README.windows but make
fails early with
make: *\ No rule to make target /home/tknopp/julia/usr/bin', needed by
release'. Stop.
Unfortunately I am not a make expert.
Ok it now starts to compile. It seems to be that in step 5 of README.windows one has to replace the make.exe in C:/mingw-builds/msys/bin and not putting it into mingw64\bin. Further I had to replace "C:/" by "/C/".
It's generally good to back-port any pure bug fixes, if it doesn't take too much effort.
I have still not got julia compiled under windows using MinGW due to a crash when building llvm (tblgen.exe crashes) So if someone has a recent build of Julia this would be great.
@tknopp I started a cross-compile this morning but had to run out for work. I can send you a binary when I get home tonight (EST) ..unless somebody else has one.
@vtjnash My windows builds still die due to #5142
@JeffBezanson that patch is not quite only a pure bug fix, and it is not quite easy entirely to backport. however, this may tip the balance in favor of making it happen
I sent @tknopp a build to test; can put it up somewhere if there is interest.
@ihnorton Thanks! But where have you send this to?
@tknopp your googlemail from the mailing list... re-sent just now.
Thanks Isaiah. My findings:
jl_eval_string
as I have not a working REPL compiled with the Intel compiler.@tknopp does your same C program run against the Julia master version from ihnorton work?
@vtjnash: Yes when I call jl_eval_string
from a C program compiled with the Intel compiler using the MinGW libjulia.dll from @ihnorton works. This is so weird. This basically leavs the repl but I cannot think of anything causing this there
the repl uses multiple tasks, your example code presumably does not. it may be helpful to load the symbol table for ntdll.dll in your debugger so that you get a valid stack trace
When I load symbols I get the following for the windows libraries
ntdll.dll!ZwDelayExecution() + 0xa bytes
KernelBase.dll!SleepEx() + 0xb3 bytes
I don't know if I could get debug symbols for the NI libs.
You said that repl uses multiple tasks. But I thought that this is serialized code. Or are there threads involved anywhere?
Am I correct in assuming that your Intel compiled code was 64-bit and using the native platform setjmp? If so, that would explain why the REPL doesn't work (or tasks, or exception handling, for that matter). I hadn't really paid attention to the Windows.mk file, but I see it doesn't seem to list the setjmp/longjmp assembly files.
@vtjnash This is funny. When trying to get MSVC to work I asked myself why I had no issues with setjmp
when compiling with the Intel compiler. The reason is that julia.h
has changed and is now wrong:
#if defined(_OS_WINDOWS_)
#if defined(_COMPILER_MINGW_)
int __attribute__ ((__nothrow__,__returns_twice__)) jl_setjmp(jmp_buf _Buf);
__declspec(noreturn) __attribute__ ((__nothrow__)) void jl_longjmp(jmp_buf_Buf,int _Value);
#else
int jl_setjmp(jmp_buf _Buf);
void jl_longjmp(jmp_buf _Buf,int _Value);
#endif
#define jl_setjmp_f jl_setjmp
#define jl_setjmp_name "jl_setjmp"
#define jl_setjmp(a,b) jl_setjmp(a)
#define jl_longjmp(a,b) jl_longjmp(a,b)
#else
// determine actual entry point name
while about 1-2 month ago:
#if defined(_OS_WINDOWS_)
#if defined(_COMPILER_MINGW_)
int __attribute__ ((__nothrow__,__returns_twice__)) jl_setjmp(jmp_buf _Buf);
__declspec(noreturn) __attribute__ ((__nothrow__)) void jl_longjmp(jmp_buf _Buf,int _Value);
#define jl_setjmp_f jl_setjmp
#define jl_setjmp(a,b) jl_setjmp(a)
#define jl_longjmp(a,b) jl_longjmp(a,b)
#else
#define jl_setjmp_f setjmp
#define jl_setjmp(a,b) setjmp(a)
#define jl_longjmp(a,b) longjmp(a,b)
#endif
#else
// determine actual entry point name
But this is more a side note. As both the libjulia compiled with MSVC and Intel work in my C program, the issue seems to be something else.
@vtjnash, @ihnorton: I have found a solution to this problem: Upgrade the DAQmx lib to the recent one. This is of course not completely satisfying but if this is a multithreading issue in that lib I can't see that we can do anything about that.
Ok, it worked after upgrading the installer but before restarting windows. Now after restart it hangs again.
What happens when you ccall
the stub function suggested by @ihnorton? Perhaps something like this
int myDAQmxResetDevice(char *name)
{
printf("About to reset device for %s\n", name);
fflush(stdout);
int status = DAQmxResetDevice(name);
printf("status = %d\n", status);
fflush(stdout);
return status;
}
and ccall
it; it might give you a clue about when the hang is happening.
Can you try again now that the Scheduler has been removed?
Yes will try tomorrow when I have a device at hand (provided that the nightly is working).
Actually I cannot find a recent nightly. I thought that there were download links at http://status.julialang.org/ The prerelease from http://julialang.org/downloads/ is 4 days old and thus does not include the removed scheduler.
Just an update that this still does not work with the recent 0.3 prereleases. I have tested both 32bit and the 64bit version.
I have tested this with a NI USB a/d device and the latest NIDAQ library, and see the same result.
I compiled Tim's code fragment under MSVC, and then ran Julia under the Visual Studio debugger. Obviously this is useless for Julia, but allows setting a breakpoint in my DLL. As soon as I step through the DAQmxResetDevice
call, the Julia window is raised again - the call never returns, even when I interrupt Julia (which works fine). I've been able to step into disassembly of the NI libraries for a while, back and forth in to critical sections, etc. but haven't reached where it is giving control back to Julia.
As a workaround, you could try switching to DAQmxBase. While the documentation often tries to redirect you into using DAQmx, my experience was that the base version was more cross-platform. I was not sure if they actually even share the same codebase. In my testing, the DAQmx code appeared to use a worker thread, and was best suited for interaction through their GUI, whereas DAQmxbase was better suited for accessing from C (the cross-platform ability was also a requirement for me at the time).
If you run code without the repl, does it work? Perhaps the julia interrupt signal handlers are getting in the way.
Thanks @ihnorton so much for testing this. This at least gives me evidence that I am not crazy seeing this bug :-)
In my code base the DAQmxResetDevice
is much deeper located in several function calls and I was also able to step until DAQmxResetDevice
where it hangs.
Currently this issue is not super important for me. But still given that Python using ctypes does not face this issue it would be nice to determine whether there is some fundamental problem in Julia. Something within Julia has to trigger it.
Regarding Jameson's theory about threads, what I observed was four new threads being spawned after entering DAQmxResetDevice
, three of them waiting and ending, and then there was still one left when control jumped back to Julia. I will try building a version without signal handlers, and also using Visual Studio 2012 which seems to have some better multi-threaded debugging support than 2008.
yeehaa. I just downloaded the lastest windows binary and the issue disappeared. I tried some weeks ago and it was still there. @ihnorton: Maybe you could confirm? I close this now as I am not seeing it anymore. Thanks to whoever fixed it :-)
Perhaps this too was related to the openblas permissions/leaked handle issue?
Having a strange issue with ccall on windows 7 (64bit, Julia 0.2). I am using a National Instruments data acquisition card and try to call a function of the API via ccall (http://zone.ni.com/reference/en-XX/help/370471W-01/daqmxcfunc/daqmxresetdevice/). When I do this from C or from Python using ctypes this works fine. When I use
ccall
, the function hangs and does not return. Even when I call the function indirectly via a self-build DLL the function hangs, while there are no issue when using this from C/Python.I have really no idea whats going on there and if this is a Julia issue at all. Can it be that there are problems when calling non-mingw dlls from Julia that has been build with mingw?