google-code-export / nativelibs4java

Automatically exported from code.google.com/p/nativelibs4java
1 stars 1 forks source link

Possible multi-threading issue (hang when second thread tries to invoke library method) #53

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I am not sure what is the reason actually, but both implementations are hanging 
in the described situation.

I am attaching current sources (please launch Ex*.java from the *.example 
package).
In the case of AviCap it hangs in the class VideoCaptureDevice.java, line 172..
In the case of Pcap it hangs in the class NetworkDevice.java, line 214..

Also please check NetworkDevice#getIP() method.. There are 2 lines commented 
(that reads Pointer<Byte>) that leads to JVM crash or 100% CPU.

File is attached in the #50

Original issue reported on code.google.com by andrei.s...@gmail.com on 8 Mar 2011 at 4:43

GoogleCodeExporter commented 9 years ago
Hi Andrei,

If the deadlock issue also affects the JNA implementation, the odds are low 
that BridJ is the culprit.

Have you looked at articles that cover multithreading issues of the libraries 
you're mapping ?
- "Be careful with pcap_compile. It’s not thread safe (as of WinPcap 
4.1beta5" 
http://sharkfest.wireshark.org/sharkfest.09/DT5_Varenni_WinPcapDosDonts.pptx
- http://www.codeproject.com/KB/audio-video/VFWWebcam.aspx

You can make a BridJ-bound library mostly thread-safe by just adding 
'synchronized' keywords to all of the method bindings, or using the JNAerator 
-synchronized switch.

Regarding the 100% CPU issue, this is most certainly a bug in BridJ or a wrong 
mapping, I'll investigate ASAP.

For simpler tracking of the issue, I'm re-attaching the bindings you've 
attached in issue #50 (please let me know if you want them removed from here).

Cheers
--
zOlive

Original comment by olivier.chafik@gmail.com on 8 Mar 2011 at 6:35

Attachments:

GoogleCodeExporter commented 9 years ago
Hi Olivier,

JNA works perfectly in this cases (both AviCap and Pcap).
JNA Pcap wrapper works on Win, Lin and Mac.

As to attachment .. this code are supposed to be open source.

Regards, Andrei.

Original comment by andrei.s...@gmail.com on 8 Mar 2011 at 6:46

GoogleCodeExporter commented 9 years ago
Hehe, thanks for the clarification :-)

For the pcap_compile hang, external synchronization appears to be compulsory 
(http://www.winpcap.org/pipermail/winpcap-bugs/2008-July/000705.html).
Some multithreading issues might just go unnoticed in JNA, given that you're 
using the interface mode (at least an order of magnitude slower than JNA's 
direct mode or than BridJ, in normal or raw modes).

Original comment by olivier.chafik@gmail.com on 8 Mar 2011 at 7:14

GoogleCodeExporter commented 9 years ago
The AviCap and PCap JNA wrappers logic is the same:
Thread1 uses library to enumerate available video/packet capture devices.
Thread2 initializes selected device and starts video/packet capture.

So only one thread performs pcap_compile.

Regards, Andrei.

Original comment by andrei.s...@gmail.com on 8 Mar 2011 at 7:26

GoogleCodeExporter commented 9 years ago
Hi Olivier,

Probably issue is not related to multi-threading... I attached a very basic 
AviCap example that leads to 100% CPU.

Regards, Andrei.

Original comment by andrei.s...@gmail.com on 16 Mar 2011 at 7:40

Attachments:

GoogleCodeExporter commented 9 years ago
Hi Andrei,

Thanks a lot for the precisions, I can now reproduce the issue (which only 
happens with release builds, so that gives me quite a few leads ;-)).

Working on it, stay tuned !
Cheers
--
zOlive

Original comment by olivier.chafik@gmail.com on 18 Mar 2011 at 6:23

GoogleCodeExporter commented 9 years ago
Hi Andrei,

Your avicap bindings need to declare that the __stdcall convention should be 
used, thanks to an annotation on the AviCap32 class (or on each of its native 
functions) :

@Convention(Convention.Style.StdCall)

Might not fix all the issues yet, but it should be a start :-)
Cheers
--
zOlive

Original comment by olivier.chafik@gmail.com on 18 Mar 2011 at 8:14

GoogleCodeExporter commented 9 years ago
Thanks! forgot about StdCall.. nevertheless 100% CPU doesn't look good.. are 
there any possibility to throw an exception instead?

For some reasons method returns null :/

Regards, Andrei.

Original comment by andrei.s...@gmail.com on 18 Mar 2011 at 8:30

GoogleCodeExporter commented 9 years ago
Hi Olivier,

Returned "null" was my bug. Video driver handler is returned correctly now :)

But it seems BITMAPINFOHEADER fields are read incorrectly from native.
240x0 is returned instead of 320x240 (BITMAPINFOHEADER.biWidth() x 
BITMAPINFOHEADER. biHeight).

Also is it correct to get callback method pointer using "callback.getPointer()" 
method?

Updated project is attached. Main class is TestAviCapVideoCapturer. Windows 
environment.

Regards, Andrei.

Original comment by andrei.s...@gmail.com on 2 Apr 2011 at 4:50

Attachments:

GoogleCodeExporter commented 9 years ago
Hi Andrei,

So the bug here is that BridJ does not deal properly with the @CLong 
annotations while computing struct layouts.
A workaround is to replace all these long fields by int fields before the 
JNAeration... (anyway on Windows, C long = int on 32 and 64 bits versions of 
the OS).
I'm working on a fix :-)

Cheers
--
zOlive

Original comment by olivier.chafik@gmail.com on 10 Apr 2011 at 11:18

GoogleCodeExporter commented 9 years ago
(attached a fixed BITMAPINFOHEADER with all the @CLong fields replaced by ints)

Original comment by olivier.chafik@gmail.com on 10 Apr 2011 at 11:19

Attachments:

GoogleCodeExporter commented 9 years ago
Hi again,

Revision #1878 solves the @CLong (& @Ptr) sizing issue (I've redeployed 
0.5-SNAPSHOT), hopefully getting things a little better :-)

Cheers
--
zOlive

Original comment by olivier.chafik@gmail.com on 10 Apr 2011 at 11:33

GoogleCodeExporter commented 9 years ago
About the 100% CPU when the calling convention is incorrect, I'm afraid there's 
little I can do... In theory, BridJ should now pick the stdcall convention 
automatically if the name was mangled in the stdcall way, but if it's not I 
have no way to detect the convention (even looking at the assembly code of the 
target function).

I've updated the FAQ with an entry about this :-) 
(http://code.google.com/p/bridj/wiki/FAQ)

As for this bug, am I right to consider it fixed ? (please feel free to open 
other separate issues for new problems)

Cheers
--
zOlive

Original comment by olivier.chafik@gmail.com on 10 Apr 2011 at 11:40