Closed LitterBugs closed 7 years ago
Just received released board. Going to retest on the "official" board.
I'm just back from vacation now and am thinking about this.
Seppuku isn't vulnerable to the same things that RE1 was in this area.
But all three of the graphic OSD boards employ the same approach-- try to clamp video to a correct DC level, compare to a (dynamic or static) threshold, and then that fires interrupts that do stuff on the FC. If the bias level drifts from extreme levels the thresh level can start hitting other features on the video signal which creates an interrupt storm and ties up the FC from doing other things.
I don't know if you have a scope but it would be interesting to know the p-t-p levels your camera is outputting-- if they are very high ( >2 V peak-to-peak) it would be interesting. It looks like you're using a CCD camera from the types of bloom artifacts you're getting on bright light so that's interesting too.
There are some defenses to this type of situation I've been wanting to implement for awhile. @mluessi long ago on original brain implemented a quick return from the vsync interrupt when it arrives too early (and this logic is used on Seppuku), but it would be better to disable the interrupt entirely and then enable it when it is reasonable.
Doing that "right" will take some thought / a little time.
Grabbed some shots off the scope of the TBS ZeroZero signal with various lighting. Have not been able to get it to reboot on the table yet.
Lens covered: Normal scene:
Bright Light in center of screen:
Bright light in upper right of screen:
Hope to get out to the field tomorrow....
@litterbugs -- I've passed lots of bad video to Seppuku of various forms and cannot reproduce. Have you had any luck?
Mike, Just gave it another shot with a flashlight on the bench and was able to duplicate a reboot unarmed.
Karl
On 3/12/2017 2:07 PM, Michael Lyle wrote:
@litterbugs https://github.com/litterbugs -- I've passed lots of bad video to Seppuku of various forms and cannot reproduce. Have you had any luck?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/d-ronin/dRonin/issues/1588#issuecomment-285962500, or mute the thread https://github.com/notifications/unsubscribe-auth/AVDleFFmF_woQPh6VaLFc0ACPBi8FSaWks5rlDRvgaJpZM4MFnTp.
Seems to happen when there is a vertical line lens flare and it crosses the right edge of the screen. Here are some screengrabs from the 2nd video I posted at the time of the reboot. I was able to duplicate this scenario with a flashlight close to the camera. I'll try to get it on DVR, and also see if I can rig my scope up to capture the raw video.
Was able to duplicate on table again. This time lens flare in center of the screen was able to trigger.
Single stepping through the frames just before the OSD droppout, I noticed that this time there was a "Sensors" alarm this time that did not happen on the previous times. One other thought, could it be possible that something on the FC is photo sensitive?
I am going to going to try exposing the FC to light and see if it can be triggered. Also will wire up my "released" Seppuku tonight and move this camera over to it to try to duplicate it on another controller.
Thanks for your effort @LitterBugs 👍 .
Yes-- thank you @LitterBugs.
I think you may have hit the nail on the head here-- very very clever. It can't really be a sync issue if the OSD is still displaying up to the moment of reset and is even able to communicate an alarm-- if sync was garbage the OSD would be tearing or missing.
cc/ @DTFUHF
One other thought, could it be possible that something on the FC is photo sensitive?
The baro definitely is-- the holes on the package open up to the die and it's on the same bus as the other sensors. Every other IC would be photosensitive also if their package was damaged/cracked-- more to UV than other light, but even just to visible light.
Usually the photosensitive stuff on the baro has only been seen to mess up altitude/analog stuff (this is a primary reason for foaming the baro in builds), but if it were to get "stuck" on the SPI etc it would cause a lot of problems.
(I am also going to look over datasheets-- we're not using any CSP packages, which are known to not completely protect the chip from light-- but just to make sure)
I will also enclose the FC and see if it can still be duplicated just with light hitting the camera....
Both times when it happened in the air, The quad was still completely controllable for about 4 seconds after the OSD disappeared. That makes me think this is an OSD crash....
Tried shining light on FC only. Did get chanages on the Baro, but no crashses. As soon as I started back on the camera, I was able to get it to reset several times almost on command. Going to try grabbing the raw camera signal stream with my scope and work on wiring up my other Seppuku later tonight. In theory, can play back the stream from my scope for testing too. Will give that a shot too.
Just disable the osd module and repeat what you know will cause reset as a test. My early revision board also had sensor issues and resets under the right conditions. I just got the new seppuku soldered up and hope to test it shortly.
I was able to duplicate this with a fresh Seppuku FC, clean install of Artifice, and the same camera. Shining light on the FC does not trigger it. I can make the Baro readings vary by 100+ meters, but it does not crash/reboot. Will be much easier to get the scope on it without a frame and excess wiring in the way. Going to throw a couple different cameras at it too.
@pug398 Disabling the OSD isn't really an option. That's the first indicator that the problem has happened. Still have full control 4 seconds after OSD goes away.
If you connect the test setup to the gcs and wave the flashlight in front of camera does the sensor warning go red? If it does then go to OSD tab and untick enable OSD. Reboot and try again and see if sensor still goes red.
Good idea @pug398. If it still crashes without OSD module running...
Not sure the sensor warning was cause or effect but really leaning twords effect based on what actually triggers the problem. I just tried a Runcam Swift in place of the TBS ZeroZero and can not duplicate it with the settings it has. I'll hook up the Runcam OSD cable and crank the contrast to see if it will do the same thing. It's quite obvious the ZEROZERO contrast is too high. It is currently something specific to the signal of the ZeroZero camera when it blooms badly. Most of these cameras are based off the same sensor and chips but have different default settings. So if it can happen for one, it may happen for others.
The GCS link drops when the reboot happens. I'll be able to test with OSD disabled as well. Working on that now.
This does look more like OSD-- I looked at the video and there are sync artifacts.
One other thing that could be related-- if the camera in question is AC coupled (has a big capacitor on its output), the relatively large capacitor we have on video input could perhaps be a problem with level drift.
We have a relatively large ceramic (2.2uF) capacitor on the input because it's good for video quality in general-- but if there's already a coupling cap on the other end of the wire the net effect can be to do odd things vs. a smaller cap (0.1uF).
[BTW, I have not looked at code but I am not sure disabling OSD actually disables the parts of the pios_video driver that would be the problem)..
Disabled the OSD and no reboot.
Changing contrast on Runcam swift did not cause it to be problematic.
Looking through my box o parts for other cameras to test with.
So far none of the other cameras I've tried will result in this problem. Going to work on capturing video with the scope to see if playing it back through my scope will duplicate the issue. If it will, I can attach the capture file. If not, I can at least get a sample screengrab of what the signal looks like that is triggering it.
Would hooking up an OpenLager be of any use?
Harder to duplicate with the scope attached, but still possible. Is the signal level supposed to drop down to sync level in the middle of a line?
It definitely shouldn't drop down to sync level. Maybe try scoping the HSYNC and VSYNC signals coming from the sync detector on seppuku; it may detect spurious hsyncs when the level drops low.
This must be some of the special sauce that is baked into the ZeroZero. It is on my $#!t list for this and lack of adjustability.
@LitterBugs -- no, it definitely should not.
Where were you scoping this? The video test point on the bottom of seppuku, or straight from the camera?
If it's straight from the camera that's definitely no good.
We do need some kind of fix for too-quick-interrupts.
@mluessi -- does RE1 do anything other than thresholding? It looks like to me this has to be vsync (since hsync doesn't fire an interrupt except on overflow/update)
This was scoped directly from the camera at the input to the Seppuku. Anything else you want me to check?
@mlyle yeah, the sync detector in RE1 needs the signal to be low for at least 1us for hysnc to be detected, so it may get rid of some spurious detections.
One way to fix this could be to disable the hsync interrupt until the DMA SPI transfer is complete, but it would need a 2nd interrupt for line to re-enable it, which is not ideal.
I guess the good news in all of this is it takes some really bad non-spec video to trip it up. I'll throw it on a RE1 tonight and see how it handles it.
@litterbugs -- yes, I think that about says it. I do think we need protection against lines toggling too quickly. People have crashed FCs before by hooking up frsky high frequency pwm (90KHz) to serial or pwm decode lines, or there's this kind of issue. It's just tricky to do in a general way.
Tested RE1 with same camera for over 5 minutes and could not get it to trip up. On a side note, Trappy put out a hack for the ZeroZero to enable the settings menu, so I may be able to adjust the bloom out of it and use it on the build this RE1 is going in. I'll keep it on hand for future testing. Just received a couple runcam Minis that can go into my beater Seppuku Test mule and my Seppuku clean build.
I've tested a handful of other cameras on the Seppuku other than my ZeroZero and have not been able to duplicate the problem. I put a support ticket in with TBS. Will be getting a V2 version of the camera which I will test when it arrives. Will also try enabling the config of the V1 camera to see if changing the settings can "fix" the camera output. This camera has some really good low light properties, but obviously the bright properties have issues with it's current configuration.
Testing results: TBS ZeroZero V1 - Fail STRIX Ochi 650 - Pass Runcam Swift - Pass Runcam Swift Mini - Pass HS1170 - Pass
I have a handful of other cheap non-adjustable CMOS cameras which I will test as well.
The only thing I can add to this is the composite sync pulse can be scoped coming off the sync detector (pin1) and when there is a sharp contrast change you can see the horizontal pulse distorting coming up off of low level maybe a 100-150mv and then going back to low again. Without any hysteresis I could easily see this detected as 2 horizontal pulses instead of 1.
@pug398 I don't think extra hsync should be a problem, because all it does is starts a counter for when clocking out the line should actually begin (and this is hardware, no interrupt, etc).
Extra, frequent vsyncs could be very, very bad though-- which the timing of this at the end of the line as happens to @LitterBugs is about worst case.
The scoped artifact seems to be "very colorful black"-- e.g. colorburst frequency on a black signal causing it to poke down. I actually have pads down for a lowpass filter because I thought it might be required in some cases. @dtfuhf may decide based on this to populate the extra passive for that filter to make us more robust to this kind of thing.
I am going to mark this as duplicate of #1340, as it's the same core issue and should-- ideally-- be solved together.
I'll scope the H&V sync at the LM1881. Should be able to get both at same time since this is a dual channel scope. Would you like it added to this thread, or #1340 ?
Here is fine, #1340 will refer to it. (And this can be a useful place to note for hardware revisions, though this bug tracker isn't relaly hardware focused).
(You might want to compare video & csync, to see if it's easier to catch the anomaly.. and then if so video & vsync).
@LitterBugs I really appreciate all your help and patience with this. I know it's been a pain. All in all, a very strange problem.
A few vertical sync pulses are dropped but don't really see any added ones.
@pug398 -- do you have a repro of the crash from overpeaked video?
@mlyle Really not a pain and very glad to help. I'm getting a better understanding on how it all works and eventually I'd like to get into some light coding. Helping find problems and diagnose them has been a big part of my life.
@litterbugs We'd love the help :D
My coding skills are very outdated.... Better at reading, tracing, and debugging than developing. Still trying to get a good mental map of how it is all layed out and linked together. Started putting together a dev environment once. Need to get back to that after I get all my builds done.
Yeah, I should have said C/V sync. If those are logic rather than analog, I can actually capture them all. I can do a total of 8 channels, but only two can do analog.
@LitterBugs they're "logic"-- though @pug398 sees a slight movement on one of the signals with no pulse.
re: dev-- It's a big codebase and it's a bit arcane. We're steadily removing code that does nothing and reorganizing ;). When you get around it, if you have any particular area of interest I'd be glad to show you around that part of the code.
So here's a quick sample of good video with the lens cap on. Video on input A (top), Csync on input 0, and Vsync on input 1. Set trigger on Vsync and it works like a charm.
VSync seems to be normal even during an incident. Going to zoom in on Csync to see if anything shows there.
I've never used bitscope, but is the sampling rate high enough to capture csync correctly? It looks like lots of csync edges are missing / jittered out.
(I also wonder if we could be missing similar, short edges on vsync)
Yes, but my display isn't big enough so the screen shows aliasing. Need to get my bigger monitor hooked up to my laptop.
I'm also trying a couple other tools to see if I can get a better sample.
Issue details: Under the right lighting conditions, OSD looses vsync with camera or switches from NTSC to PAL and disappears. Four seconds later the FC reboots. This looks to be similar to the issue we had with the RE1.
I can reproduce this with the sun in the upper right corner of the screen.
EDIT: Add Build Details RMRC Hellbender 204 V1 frame Seppuku pre-release test FC/OSD TBS zero/zero camera RMRC cricket 200mw VTX Omnivision 3 lobe RHCP antenna BrainFPV mPB PDB Speedex ES 25A OS125 Spektrum FPV Racing diversity RX Sunnysky x2204s-16 2300kv motors DALPROP 5045 triblade Tattu 1550mah 4S 75C
screen grabs from video:
Just before OSD glitch
OSD format shift
OSD Loss
WDOG after FC rebooted in flight
Two videos demonstrating the issue https://youtu.be/EOTPI6e50Zo
https://youtu.be/BubItLA9I7U