Open brunoseivam opened 8 years ago
Is the frameCallback function actually being called for the bad frames? If so we could change the behavior to at least stop acquisition when the correct number of frames, good or bad, have been received.
Yes, it is for most of the time, although I found some instances where it is not even being called. The rate of bad frames correlate with the CPU usage by another IOC, so I guess the prosilica thread might be getting starved of CPU time and can't keep up with the data rate?
The machine has 12 cores, one IOC is consuming ~350% and the prosilica IOC is consuming ~100%, so I wouldn't expect it to be an issue.
I will try pinning the IOC to a set of CPUs and see if that helps.
I tried setting GvspResendPercent
to 100%, but it didn't seem to help much.
If one thread in the prosillica IOC is using all or most of that 100%, that might be the problem. Having idle cores won't help in that case, if one thread is maxing out a core.
cam07
is the one driving the CPU usage high. cam03
is the one giving me grief, even though none of its threads is getting to 100%. CPU pinning didn't help. Does the PvAPI library use only one thread to handle all requests from different IOCs?
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4096 cam07 20 0 2553m 410m 5932 R 96.6 1.3 2926:56 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
4058 cam07 20 0 2553m 410m 5932 R 94.0 1.3 2970:24 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
4030 cam07 20 0 2553m 410m 5932 R 70.6 1.3 7769:35 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
26614 cam03 20 0 529m 83m 5172 R 55.1 0.3 10:51.27 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam03/st.cmd
4463 cam07 20 0 2553m 410m 5932 R 28.9 1.3 2390:17 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
26573 cam03 20 0 529m 83m 5172 S 20.5 0.3 3:37.64 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam03/st.cmd
26606 cam03 20 0 529m 83m 5172 R 18.8 0.3 3:40.06 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam03/st.cmd
22796 cam07 20 0 2553m 410m 5932 S 15.3 1.3 10:19.45 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
26594 cam03 20 0 529m 83m 5172 S 12.2 0.3 2:35.21 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam03/st.cmd
4136 cam07 20 0 2553m 410m 5932 S 11.0 1.3 1595:23 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
4032 cam07 20 0 2553m 410m 5932 R 10.0 1.3 1279:10 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
4028 cam07 20 0 2553m 410m 5932 S 7.2 1.3 881:15.71 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
26628 cam03 20 0 529m 83m 5172 S 7.2 0.3 1:32.90 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam03/st.cmd
19505 cam07 20 0 2553m 410m 5932 S 4.8 1.3 18:21.02 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
4151 cam07 20 0 2553m 410m 5932 S 4.3 1.3 588:21.18 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
4155 cam07 20 0 2553m 410m 5932 S 4.3 1.3 649:44.59 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
26659 cam03 20 0 529m 83m 5172 S 3.8 0.3 3:23.81 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam03/st.cmd
4034 cam07 20 0 2553m 410m 5932 S 1.7 1.3 212:36.21 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
4159 cam07 20 0 2553m 410m 5932 S 0.7 1.3 40:17.57 ../prosilica/bin/linux-x86_64/prosilica /epics/iocs/cam07/st.cmd
Have you applied the system changes discussed in this tech-talk thread?
http://www.aps.anl.gov/epics/tech-talk/2013/msg00787.php
It involves increasing net.core.rmem_default and net.core.rmem_max.
That did the trick! No bad frames anymore. Thanks!
Although if the driver does perchance receive one when in Single or Multiple mode it will still get stuck :)
Note that the link to the Point Grey Knowledge Base article in my old tech-talk thread no longer works. However, this link does work:
That did the trick! No bad frames anymore. Thanks! Although if the driver does perchance receive one when in Single or Multiple mode it will still get stuck :)
We just re-discovered this issue with cameras at NSLS-II. It is not clear how to fix it, as Bruno said above. Questions:
How long to wait for timeout? In Single mode this can be the AcquirePeriod*margin + minimum. Using AcquirePeriod rather than AcquireTime allows the user to avoid timeouts in the case of external triggers by setting the AcquirePeriod to be larger than the time between Acquire=1 and actual trigger.
What to do on timeout? Try again? Return error? Return dummy frame and error?
How to handle Multiple mode where more than 1 frame could be dropped?
When in Single or Multiple mode, the driver sets
framesRemaining
to1
ornumImages
, respectively. However, when it receives a bad frame, it won't decrementframesRemaining
nor reissue new triggers, which will leave the driver stuck in theAcquire
state.Although ideally the driver shouldn't be receiving bad frames, I think it shouldn't get stuck when it does. However, I don't know how to properly address that. Should it fail and return an error? Should it reissue the acquisition for the frames that came in bad? What about hardware triggers?