Open luc-bot opened 2 years ago
Hi @luc-bot ,
I once had problems with the acquisition being blocked.
To solve this, I added the buffer.queue()
command after the buffer manipulation to allow the buffer to be used again in another acquisition.
Hi @mahwhat, is there a specific time after I call the buffer that I can queue it? If I trying queuing the buffer right after I grab the image I get an error that "GenTL does not support this operation"
With these lines of code: buffer = ia.fetch_buffer() img = buffer.payload.components[0].data buffer.queue()
If I queue the buffer right after I'm done manipulating the variable "img" then sometimes it'll get stuck in the infinite loop mentioned above.
Hi @luc-bot,
I normally enqueue the buffer after manipulation and process the image. Here's a snippet of my code:
buffer = ia.fetch_buffer()
component = buffer.payload.components[0]
_img = component.data.reshape(component.height, component.width)
self._pipeline(_img) # Here i process the image
buffer.queue()
@mahwhat Hi, thank you for the help you offered. I appreciate that. Regards, Kazunari.
I tried different methods to queue the buffer as shown in the setup page. With some new testing I think I narrowed down the cause of the error. It appears that sometimes my camera returns an error message instead of an image. In this event ia.fetch_buffer() gets stuck and I never get to the next line of code.
This took a significant of time to test because I have a few different cameras and the problem only occurs on certain systems, possibly a problem with the network settings?
A possible case that I can think of is, where Harvester faces an exceptional situation and forgets to queue the corrupted data containing buffer to the GenTL Producer. So here are two questions for you:
with
statement"?Thanks, Kazunari.
If you have a traceback that can be a piece of evidence for question no. 1, please paste it; it will be valuable for the investigation.
Alternatively to using harvesters I have accessed the camera with "eBUS Player" a software recommended by the camera manufacturer. While switching between my code, which is made with harvester, and this software I have noticed that only on instances when I receive these warnings do I get stuck in an infinite loop as described in my initial post. You can see the warnings that I am receiving in the attached image
Yes, the issue exists in both "manual queuing" and "automatic queuing based on the with statement"?.
Please let me know if you more details would be helpful for you or if you can suggest a more robust way for me to prove my hypothesis is correct that the warning messages are related to the bug.
Thank you.
Excuse me, the comments above do not mean the issue has been resolved. I am planning to prepare suggestions for further investigation.
@luc-bot Hi, this is just an idea for trial, you can prevent the fetch (or try_fetch) call from blocking you by setting a timeout period [s] to the timeout parameter:
try_fetch(timeout=0.1) # the unit is [s]
After the specified period is elapsed and no valid image was delivered then the try_fetch
call will return None
and the fetch
call will raise genicam.gentl.TimeoutException
.
Harvester can't help you to prevent a situation where a buffer unexpectedly drops but the suggested approach above would make sense for you more or less.
Hi @kazunarikudo, previously, I was using version 1.3.2 and updated to version 1.3.6 to use try_fetch. Setting the timeout as you mentioned above seems to solve my previous issue, but there appear to be 2 new problems this has raised.
@luc-bot Hi, just a quick comment: Could you try the instruction that I mentioned here so that I can diagnose what's happening on your side? I need to confess that I do not know if there's a difference between 1.3.2 and 1.3.6 but I guess the lag you are facing is coming from a fact that sufficient buffers are not being delivered. In principle, Harvester just asks a GenTL Producer if it has a buffer that is ready to be fetched. In addition, when the Producer says "yes, I have," the buffer has already been on the computer so Harvester itself never drops it. Finally, you face the consecutive None because there's no buffer to be fetched; it eventually causes timeout and returns None to not block the execution. The consecutive timeout is just showing up as None.
One more thing, I would like to encourage you to try another setup such as another computer, another camera, another cable, etc. Of course, they may never be the production component but it should be worth trying to check if something can make difference. If such image acquisition quality is the best performance for Harvester then nobody would use that. ;-) Last but not least, could you tell if you have enabled jumbo-frame? I do not want to hear you've been using 1500 bytes that is by default! Thanks!
Hi, I think I am facing the same issue as the OP.
In my case I am using 2 cameras with the .cti producer from Matrix Vision (downloaded from here) in a multiprocessing environment. Since the full code is quite complex here is a stripped down version that produces the same behaviour:
Both threads sometimes get stuck in an infinite loop at buffer = ia.fetch(timeout=0.5)
. Unfortunately it happens rarely and randomly so it's hard to know exactly what's triggering the behavior. When the threads are stuck they generate no error and become non responsive so when I quit the application they remain active and I have to manually kill all the active python processes to get rid of them.
Anyway, I noticed that it's more likely to happen when there are many parallel processes (in my main code there are at least 6 more processes and it happens almost 50% of the time) or when the previous run has crashed or has been force closed.
I tried to run this code on both Windows 10 and Debian 11 on 3 different machines and a VM and the infinite loop happens ~5% of the time. If I enable Harvesters logs as suggested here I can see this line repeated over and over (with different memory addresses) while stuck in the infinite loop:
2022-08-03 14:01:06,492 :: harvesters :: WARNING :: incomplete or not available; discarded: <genicam.gentl.Buffer; proxy of <Swig Object of type 'std::shared_ptr< GenTLCpp::Buffer > *' at 0x7f5687869780> > :: MER-131-75GM-P(00:21:49:03:ae:27)_Stream_0 :: MER-131-75GM-P(00:21:49:03:ae:27) :: 00:0c:29:b2:d9:ce_ens33 :: {AF542A5A-E6D3-4f3d-9908-4A89AE21105A} :: <genicam.gentl.GenTLProducer; proxy of <Swig Object of type 'std::shared_ptr< GenTLCpp::GenTLProducer > *' at 0x7f5687869f90> >
However, the test script suggested on that same post gives me OK result every time (except only once that somehow I managed to get an error right after I force closed a run that was stuck but unfortunately I've lost the log). Here's the log from it if it can be helpful:
Some details that may be useful:
After some testing I found out that the usage of the method .wait()
on a multiprocessing.Event
increases the chances of the fetch
method to get stuck. After I removed all the wait methods from my original code it got stuck with the same probability as the demo code I proposed, ~5-10% of the time.
I am thinking of some sort of "desynchronization" happening between the process and the GenTL producer.
Also I tried to use run_as_thread=True
just to check it out. It shows the same apparently random behavior, but this time, when relaunching the script after it got stuck and force quit, I always get this error:
Traceback (most recent call last):
File "testing/harvesters_fetch_buffer_bug.py", line 72, in camera_worker
ia = self.harvester.create(camera_id)
File "/home/test/.venv/lib/python3.8/site-packages/harvesters/core.py", line 2899, in create
return self._create_acquirer(device_proxy=device_proxy, config=config)
File "/home/test/.venv/lib/python3.8/site-packages/harvesters/core.py", line 2916, in _create_acquirer
device_proxy.open(_privilege)
File "/home/test/.venv/lib/python3.8/site-packages/harvesters/core.py", line 214, in m
return getattr(self._source_object, attribute)(*args)
File "/home/test/.venv/lib/python3.8/site-packages/genicam/gentl.py", line 3189, in open
return _gentl.Device_open(self, accessFlags)
_gentl.AccessDeniedException: GenTL exception: Requested operation is not allowed. (Message from the source: ) (ID: -1005)
which I can only recover from by killing all the active Python processes (eg using pkill python
)
@michezio Did you solve your problem?
Unfortunately I'm not working on that project anymore.
Since I couldn't find a solution and it was an application that needed to be working 24/7 unmanaged, I "solved" it implementing a watchdog process that monitors the heartbeat of the processes using image acquirers. When they get stuck the watchdog kills and restarts them.
Actually I decided to reboot the whole system when It happened, since even when restarting the processes, sometimes the GenTL producer was unusable, so a full reboot was the only option (and in my case it was just max 15 seconds of down-time and was much more reliable).
I believe I am getting this error too. Except instead of 5-10% of the time as mentioned above, it happens every single time when I have the fps of my camera (~4000*3000 pixels) >1. If I set the fps to 1 then I don't get this error.
Seemingly randomly the function ia.fetch_buffer() is stuck in an infinite loop. I built a simple GUI which connects to a Photonfocus camera MV1312 with a start and stop button and window to display the images. If I start and stop the process multiple times it will get stuck on ia.fetch_buffer() seemingly randomly. with the example:
print('a') ia.fetch_buffer() print('b')
'b' never gets printed.