AravisProject / aravis

A vision library for genicam based cameras
GNU Lesser General Public License v2.1
840 stars 316 forks source link

Disconnection Issues and Failed Buffers When Connecting a camera with Basler Pylon on Same Computer #871

Closed Carter12s closed 4 months ago

Carter12s commented 4 months ago

Describe the bug We have existing camera code that talks to a Basler camera using Basler's provided Pylon software. We are now integrating a 2nd camera not made by Basler into our system and decided to use Aravis to do so.

Our Pylon based software runs at a rock steady 20Hz with no hiccups by itself. Our aravis based software runs at a rock steady 400Hz with no hiccups by itself.

However when we attempt to run both cameras at the same time, our aravis based code immediately begins dropping buffers and disconnecting.

I strongly suspect aravis is receiving and misinterpreting packets intended for pylon, but do not appear to have any way to prevent this from the API.

To Reproduce Steps to reproduce the behavior:

Likely tricky... to reproduce and likely requires a Basler branded camera + another GigE camera. Both our aravis code and our pylon code follow typical examples provided by both.

Expected behavior A clear and concise description of what you expected to happen. Aravis should not be affected by other GigEVision based software running on the same PCs and the connections should proceed independent of each other.

Camera description:

Platform description:

Additional context Willing to do a decent amount of legwork to try to root cause this issue and find a fix. Any direction y'all can point me in would be appreciated.

Carter12s commented 4 months ago

Continuing to investigate and this may not be a bug in aravis, but some kind of low level kernel contention / resource contention.

I set up the two seperate processes (one reading with aravis, one reading with pylon) in two separate docker containers and used docker's macvlan driver to fully isolate their networking from each other. In this setup will still saw framerates for the aravis process dropping substantially.

Carter12s commented 4 months ago

Have attempted a wide variety of different test setups and isolation layers:

Our aravis based code runs perfectly:

The only scenario is which the aravis code starts choking and spitting out bad buffers is when the Basler Pylon code is running at the same time, and this affect persists through completely isolating the two processes networking from each other with docker. Our Pylon based code appears unaffected and its framerate does not dip when aravis code is running, but not the other way around.

Interesting the magnitude of the slowdown we observe does not change whether we compile the two processes in Debug or Release mode...

Having tried to eliminate as many possible sources of contention, as far as I can tell there must be some interaction at the kernel level between aravis and pylon that is causing misbehavior.

EmmanuelP commented 4 months ago

Did you try to increase the socket buffer size for aravis ? Did you try the packet socket ?

https://aravisproject.github.io/aravis/ethernet.html

Carter12s commented 4 months ago

Excellent leads to try! Thank you for the quick response!

Results from testing:

I am pretty suspicious about some conflict between Pylon and Aravis. The system running this has ample resources to handle both camera streams, and it appears that any connection to our Pylon camera (even one running at an extremely low framerate) causes issues with the aravis connection.

EmmanuelP commented 4 months ago

You may want to try arv-camera-test, which can help you test the different parameters. Also, try to enable the debug output, especially the stream log which will give you some statistics about the packet handling.

Carter12s commented 4 months ago

Great suggestions again. Thank you for taking the time to assist me here.

arv-camera-test luckily exhibits the exact same performance as my code does directly. It runs flawlessly while running by itself, and has copious errors while running at the same time as our Basler Pylon code.

Running by itself:

$ arv-camera-test-0.8 -n Specim-FX17e-076900060266
Looking for camera 'Specim-FX17e-076900060266'
vendor name            = Specim
model name             = FX17e
device serial number   = 076900060266
image width            = 320
image height           = 224
horizontal binning     = 2
vertical binning       = 1
exposure               = 2400 ยตs
gain                   = 3 dB
payload                = 71680 bytes
gv n_stream channels   = 1
gv current channel     = 0
gv packet delay        = 0 ns
gv packet size         = 7204 bytes
399 frames/s -    28.6 MiB/s
401 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
401 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
401 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
401 frames/s -    28.7 MiB/s
400 frames/s -    28.7 MiB/s
^C401 frames/s -    28.7 MiB/s
n_completed_buffers    = 4404
n_failures             = 0
n_underruns            = 0
n_timeouts             = 0
n_aborted              = 0
n_missing_frames       = 0
n_size_mismatch_errors = 0
n_received_packets     = 52848
n_missing_packets      = 0
n_error_packets        = 0
n_ignored_packets      = 0
n_resend_requests      = 0
n_resent_packets       = 0
n_resend_ratio_reached = 0
n_resend_disabled      = 0
n_duplicated_packets   = 0
n_transferred_bytes    = 316295280
n_ignored_bytes        = 0

Running while our Pylon code is active:

$ arv-camera-test-0.8 -n Specim-FX17e-076900060266
Looking for camera 'Specim-FX17e-076900060266'
vendor name            = Specim
model name             = FX17e
device serial number   = 076900060266
image width            = 320
image height           = 224
horizontal binning     = 2
vertical binning       = 1
exposure               = 2400 ยตs
gain                   = 3 dB
payload                = 71680 bytes
gv n_stream channels   = 1
gv current channel     = 0
gv packet delay        = 0 ns
gv packet size         = 7204 bytes
373 frames/s -    26.7 MiB/s - 18 errors
370 frames/s -    26.5 MiB/s - 19 errors
340 frames/s -    24.4 MiB/s - 31 errors
431 frames/s -    30.9 MiB/s - 17 errors
350 frames/s -    25.1 MiB/s - 17 errors
386 frames/s -    27.7 MiB/s - 12 errors
373 frames/s -    26.7 MiB/s - 30 errors
375 frames/s -    26.9 MiB/s - 21 errors
372 frames/s -    26.7 MiB/s - 28 errors
^C390 frames/s -      28 MiB/s - 10 errors
n_completed_buffers    = 3761
n_failures             = 203
n_underruns            = 72
n_timeouts             = 203
n_aborted              = 0
n_missing_frames       = 4
n_size_mismatch_errors = 0
n_received_packets     = 47202
n_missing_packets      = 1564
n_error_packets        = 79
n_ignored_packets      = 72
n_resend_requests      = 190
n_resent_packets       = 71
n_resend_ratio_reached = 135
n_resend_disabled      = 79
n_duplicated_packets   = 0
n_transferred_bytes    = 281657752
n_ignored_bytes        = 381136
Carter12s commented 4 months ago

Investigating with ARV_DEBUG=stream:3,stream-thread:3 arv-camera-test-0.8 while our Basler Pylon code is running

I can find occasional entries for:

[16:41:27.799] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 1199 at dt = 121014  
[16:41:27.699] ๐Ÿ…ณ stream-thread> [GvStream::missing_packet_check] Resend request at dt = 20621, packet id = 11 (12 packets/frame)                           
[16:41:27.699] ๐Ÿ…ณ stream-thread> [GvStream::send_packet_request] frame_id = 1199 (from packet 0 to 0)                                                       
[16:41:27.699] ๐Ÿ…ธ stream-thread> [GvStream::process_packet] Error packet at dt = 20870, packet id = 0 frame id = 1199   
[16:41:26.065] ๐Ÿ…ธ stream-thread> [GvStream::missing_packet_check] Maximum number of requests reached at dt = 21940, n_packet_requests = 4 (12 packets/frame), frame_id = 545

An example of a longer run of issues:

[16:41:25.929] ๐Ÿ…ธ stream-thread> [GvStream::process_packet] Error packet at dt = 21783, packet id = 7 frame id = 491                                                                                                                                                                                        [4654/92657]
[16:41:25.931] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 501                                                                                                                                                                                                                                            
[16:41:25.934] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 502                                                                                                                                                                                                                                            
[16:41:25.936] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 503                                                                                                                                                                                                                                            
[16:41:25.939] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 504                                                                                                                                                                                                                                            
[16:41:25.941] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 505                                                                                                                                                                                                                                            
[16:41:25.944] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 506                                                                                                                                                                                                                                            
[16:41:25.946] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 507                                                                                                                                                                                                                                            
[16:41:25.949] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 508                                                                                                                                                                                                                                            
[16:41:25.950] ๐Ÿ…ณ stream-thread> [GvStream::missing_packet_check] Resend request at dt = 21545, packet id = 11 (12 packets/frame)                                                                                                                                                                                       
[16:41:25.950] ๐Ÿ…ณ stream-thread> [GvStream::send_packet_request] frame_id = 500 (from packet 2 to 4)                                                                                                                                                                                                                    
[16:41:25.950] ๐Ÿ…ธ stream-thread> [GvStream::process_packet] Error packet at dt = 21767, packet id = 2 frame id = 500                                                                                                                                                                                                    
[16:41:25.951] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 509                                                                                                                                                                                                                                            
[16:41:25.952] ๐Ÿ…ณ stream-thread> [GvStream::missing_packet_check] Resend request at dt = 1006, packet id = 8 (12 packets/frame)                                                                                                                                                                                         
[16:41:25.952] ๐Ÿ…ณ stream-thread> [GvStream::send_packet_request] frame_id = 509 (from packet 0 to 0)                                                                                                                                                                                                                    
[16:41:25.953] ๐Ÿ…ณ stream-thread> [GvStream::process_data_leader] Received resent packet 0 for frame 509                                                                                                                                                                                                                 
[16:41:25.954] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 510                                                                                                                                                                                                                                            
[16:41:25.956] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 511                                                                                                                                                                                                                                            
[16:41:25.957] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 466 at dt = 100642                                                                                                                                                                                                                
[16:41:25.957] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 466                                                                                                                                                                                                                                                
[16:41:25.959] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 512                                                                                                                                                                                                                                            
[16:41:25.961] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 513                                                                                                                                                                                                                                            
[16:41:25.964] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 514                                                                                
[16:41:25.966] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 515                                                                                
[16:41:25.969] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 516                                                                                
[16:41:25.971] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 517                                                                                
[16:41:25.974] ๐Ÿ…ณ stream-thread> [GvStream::find_frame_data] Start frame 518                                                                                
[16:41:25.979] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 467 at dt = 121496                                                    
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 467                                                                                    
[16:41:25.979] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 468 at dt = 121496                                                    
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 468                                                                                    
[16:41:25.979] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 471 at dt = 120974                                                    
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 471                                                                                    
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::check_frame_completion] Completed frame 472
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 472
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::check_frame_completion] Completed frame 473
[16:41:25.979] ๐Ÿ…ณ stream-thread> [GvStream::close_frame] Close frame 473
[16:41:25.979] ๐Ÿ†† stream-thread> [GvStream::check_frame_completion] Timeout for frame 474 at dt = 113030

From looking at these myself not know aravis too well all of this appears to be very "normal" logs and do look exactly like a system that is bottle-necking or have some communication issues. What I can't for the life of me figure out is why only our Basler Pylon code causes these to manifest and no other synthetic load I place the system under...

Carter12s commented 4 months ago

As a last hurrah to work around this issue I re-wrote our code that was using Pylon to talk to the Basler camera to use Aravis instead.

I now successfully have both cameras reliably communicating with the server using Aravis to talk to both of them.

I do believe there very likely a real compatibility issue somehow between the two libraries, but I cannot explain it.

The only thing out of numerous attempts to resolve the performance issue was to remove Pylon from running on the affected system...

Carter12s commented 4 months ago

Okay I'm going to go ahead and close this issue. I'm not really sure aravis is in anyway the problem here, but I am going to explain my findings in case anyone else stumbles across this issue themselves.

  1. We're attempting to run our Specim camera at a very high frame rate (400 Hz) which means there is a very limited window for any packet retries should a packet be dropped anywhere in the network stack.
  2. Aravis + Specim behaves perfectly when 0 packets are being dropped, but as soon as packets start being dropped none of the packet retries are successful and performance tanks.
  3. I believe this is largely due to the Specim camera likely having a single internal frame buffer, and very quickly having to clear that buffer to make way for the next frame (2.5ms at most). Different values for "initial-packet-timeout", "packet-timeout", and "frame-retention" did not seem to make a big difference here on the Aravis side.
  4. For whatever reason (still very unclear to me) our Pylon camera software causes a small percentage of packets for the aravis camera to be dropped whenever it is running regardless of the networking configuration, running the two cameras on different dedicated Ethernet cables, thread priority, whatever we tried. The inability of the Specim Camera + Aravis to successfully retry packets magnifies how detrimental it is to performance.
  5. My final setup has us talking to both the Specim camera and Basler camera using aravis, and in this setup (with identical cameras and networking setup) that small % of packet loss is not present, and both camera connections behave well.

Some remarks: