Pass-through recording - Githubissues

hyperbolic2346 commented 8 years ago

Most IP cameras send a stream like h264. When we record a motion event, why not just write that stream to disk instead of doing the cpu intensive conversion to jpg and then back to mp4? I could potentially see this being an option since marking the video with things like the timestamp or motion boxes could be problematic here, but even in those situations it seems we could leverage ffmpeg to do something more efficient than h264 -> jpg -> mp4.

jogu commented 8 years ago

"I could potentially see this being an option since marking the video with things like the timestamp or motion boxes could be problematic here"

I'm wonder if we could still manage to do that. I think h264 in an mkv container would allow us to overlay arbitrary content without reencoding the video; see eg. http://forum.videohelp.com/threads/375001-%5Bhow-to%5D-Overlay-image-title(s)-on-AVC-H-264-video-without-recoding

I also wonder if a more direct recording process like this might allow us to include audio in the recordings too.

Mr-Dave commented 8 years ago

I'd want to just start with a basic pass through. For audio, I did have this in our FAQ

MGAndreasen commented 7 years ago

Whats the feature of this idea? looks to me like this is what 90% of the people using this software is really dooing in there systems. It would also make hardware requestments for largere home installations run better, and even make the rpi able to handle much more :=)

hyperbolic2346 commented 7 years ago

This feature would allow for less processing required to write out motion events. We would still need to decode the video stream to do the motion detection, but it wouldn't require encoding again at the cost of not being able to alter the video with motion boxes or timestamps.

tosiara commented 7 years ago

Just to confirm, pass-through writing would remove timestamp and motion box?

hyperbolic2346 commented 7 years ago

Unless we could come up with a fancy way to add it to the existing stream without decoding it. I'm not saying that ffmpeg won't allow such things, I just don't know. I think it's safest to assume that we won't be able to do it if passthrough is enabled.

jogu commented 7 years ago

As per my earlier comment, some codecs/containers (e.g. h264 within mkv) allow overlays to be added without re-encoding the video, which may allow us to keep some form of timestamp / motion boxes - though it's not an insignificant problem to try and get such overlays back from the motion thread and saved into the correct place in the stream.

tosiara commented 7 years ago

@jogu if you have an example of MKV with overlay - please share. This would be really interesting feature

jogu commented 7 years ago

The link in my first comment contains some examples in there somewhere I believe.

tosiara commented 7 years ago

Oh, I see. So we will be adding a sequence of images. I wonder what overhead we will get this way? 10 fps, 1280x720 transparent PNG is like 10*1K bytes per second? Shouldn't be really much

MGAndreasen commented 7 years ago

This would no matter how, use alot less cputime, than decoding, then adding the toimestamp etc. The current way and then reencoding the stream. I would assume, that without the manipulation 1280x720@25 fps of 5 cams would be possible on an rpi 3

Den 28. feb. 2017 11.29 AM skrev "tosiara" notifications@github.com:

Oh, I see. So we will be adding a sequence of images. I wonder what overhead we will get this way? 10 fps, 1280x720 transparent PNG is like 10*1K bytes per second? Shouldn't be really much

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Motion-Project/motion/issues/124#issuecomment-283002762, or mute the thread https://github.com/notifications/unsubscribe-auth/AF0F0nUlPzx5TnvpGKNoeDM6IeiXhSBJks5rg_cggaJpZM4JfmH2 .

Mr-Dave commented 7 years ago

I have evaluated this further and do not believe that it is going to be feasible to implement. The images must be decoded and then encoded even if going from/to the same format. Possibly we could avoid conversions of pixel formats and use substream processing in #123 but the decoding and encoding steps must remain.

If you are interested in the technical reasons, it is related to IPB frames and the timings that occur between the source and the destination. The decoder/encoder makes sure we get and put the appropriate frames at the right time.

Mr-Dave commented 7 years ago

I am closing this as "unsolvable" due to the IPB frames and the fact that frames are not independent as provided from the source. The decoder step uses multiple frames in order to provide a single decoded frame. As such, we can not just take one frame out of that sequence and write it out. We also can not guarantee that we are always grabbing the frames at the exact same rate that they are being provided by the camera.

hyperbolic2346 commented 7 years ago

I haven't done much research into the actual data stream, but aren't there keyframes involved? Could we just either record the stream in a ring buffer and go back to the last keyframe? As I said, I could be WAY off here, but that was just my assumption on how this worked.

Mr-Dave commented 7 years ago

You are right that there are keyframes. (They are the I frames). I did actually think about just grabbing and coding those, but then the timing gets off on capture vs write since the camera will be operating at its own pace of sending keyframes. And if we start doing this processing for this type of frame vs that type of frame, then that is basically rebuilding our own decoder/encoder.

Note that I did actually implement this in my own test branch. (That branch is currently not pushed up since it wasn't clean but can if you'd like to modify the hard coded ips and test it). I grabbed the packets from a secondary stream, left it in the ffmpeg packet format rather than send it through the decoder, then wrote it out directly without going through the encoder. H264->H264 in the mkv container. The resulting videos were unplayable. At best only a few frames flashed on the screen. Most of the samples just resulted in garbage. Once I did this, I realized that the IPB packets really need to match perfectly to the source in order for this to work.

jogu commented 7 years ago

I believe this can be done, but it's likely a significant change from the current motion architecture.

As hyperbolic2346 states, we'd need to buffer the entire stream based on iframes, only discarding the previous iframe (and everything after it) when the motion detection thread has processed the next iframe and decided there's no motion.

Then if we do detect motion, we need to write out everything from the last saved i-frame, up until the final i-frame after the motion ends, and (perhaps) update the video container to tell the player to not show the bits that are before/after what we actually wanted to record.

This is quite similar to how ffmpeg (the command line app) does things when you ask it to extract a section out of an existing video without reencoding it.

I did also discover that mp4 seems to have similar graphical subtitling capabilities to mkv (e.g.. BIFS, though it's pretty complex and I don't know how widely supported it is). mp4 is theoretically a better choice as most browsers have built in support for h264 in an mp4 container these days, within the HTML5 standard video support.

Mr-Dave commented 7 years ago

Ok. I will open this back up but will stick by my original assessment that it can not be accomplished with a network camera and meet the needs of the application. i.e. We can't be dropping frames. Here is a proof of concept branch for those that wish to evaluate the method or results.

TermeHansen commented 7 years ago

thank you for opening this branch, I fail to compile it however in the linking process...

netcam_rtsp.o: In function `netcam_next_rtsp':
/home/termo/motion/motion-git/netcam_rtsp.c:1326: undefined reference to `av_copy_packet'
ffmpeg.o: In function `ffmpeg_global_init':
/home/termo/motion/motion-git/ffmpeg.c:756: undefined reference to `avdevice_register_all'
ffmpeg.o: In function `ffmpeg_put_image':
/home/termo/motion/motion-git/ffmpeg.c:875: undefined reference to `av_copy_packet'
collect2: error: ld returned 1 exit status

MGAndreasen commented 7 years ago

To quote... "I could potentially see this being an option since marking the video with things like the timestamp or motion boxes could be problematic here"

Maybe someone dont care that much about motionboxes? or timestamps that much, maybe a first version of this option should just not include both these options, but notify the user that using this option would remove the motionbox and timestamp on the recorded video.

iskunk commented 7 years ago

@Mr-Dave, I cannot find the "passthru" branch in your repo. Is it available elsewhere? I would like to study the changes you made.

jogu commented 7 years ago

I experimented to see what ffmpeg is actually capable of, and this command is quite good at spitting out videos, correctly breaking the recording at iframes, with pretty low cpu overhead:

ffmpeg -rtsp_transport tcp -i <rtsp url> -vcodec copy -acodec copy -y -map 0 -reset_timestamps 1 -segment_time 8 -f segment output%03d.mkv

(IIRC it doesn't work quite as well with mp4 output which is a bit of a shame but may well be fixable.)

It gives me some confidence we could achieve in the same in motion, but we would need to make a number of changes to the way we detect motion / record to get it to work.

iskunk commented 7 years ago

@jogu, is the goal just to identify I-frames? I thought the desired approach was to keep things in the FFmpeg packet format, or at least something pretty close to that, that could then be written out without any {en,de}coding.

We would also want that as a prerequisite to motion-vector-based detection, as that information is lost in flat frame-images.

jogu commented 7 years ago

@iskunk No, identifying i-frames (or MPEG Group of Pictures) is just part of what we need to do to this. As you say, the goal is to keep the original ffmpeg packets to be able to write them out without re-encoding.

You're correct about motion-vector-based detection, but I think for the first stage of integrating passthrough recording we should continue to decode into flat pictures to feed into the existing motion detection code. Reworking the motion detection code can be done in a second stage.

iskunk commented 7 years ago

Motion-vector-based detection is also going to need, as another prerequisite, support for having more than one motion-detection algorithm, period. So yes, that's a whole 'nuther can of worms :-)

General question/proposal for how to handle GOPs and events:

Let's assume, for the sake of argument, that our RTSP camera is churning out H.264 video at 30fps with an I-frame every five seconds, at :00, :05, :10, and so on.

Let's also assume that the camera sees motion from :27 to :37.

At a minimum, then, pass-thru recording will produce a video from :25 to :37 (two extra seconds at the beginning to start with an I-frame), possibly ending later (up to :40) if B-frames are involved.

If you have a pre-record period of five seconds, then your video will start at :20 (not :22) for the same reason.

So pass-thru recording will impose this restriction on the granularity of when recorded video can start (and possibly end). Of course, we deal with it by making the video start earlier / finish later than it otherwise would have, so no event detail is lost.

Does all that sound reasonable?

jogu commented 7 years ago

Yes, that sounds reasonable. I think recording a few seconds extra either side of an event is absolutely fine.

FWIW, ffprobe -select_streams v:0 -show_frames shows that a 5 fps stream recorded from my reolink camera has a key_frame (which I believe is ffmpeg's language for an I frame) every 2 seconds.

I would be slightly surprised if B frames are used at all in live-streaming encoders, although it is possible.

iskunk commented 7 years ago

Okay, great---wanted to make sure everyone's aware of the start/end time restrictions this would impose.

Yeah, 5-second GOPs are pretty big; I just chose that number to make a clear example. 1- or 2-second GOPs appear to be typical. (Not sure if that number is affected by framerate.)

(And isn't that part of why there is some delay in opening an RTSP stream? The camera doesn't send you an I-frame right from the start; so you have to receive a few unusable P/B frames until the encoder gets around to making another I-frame?)

I did see somewhere that B-frames are not really a thing in IP cameras... I guess those would either impose some latency, or require predicting the future. But it shouldn't be a big deal to support those anyway.

iskunk commented 7 years ago

Oh, one more thing to be aware of: GOP size is apparently not a fixed parameter for a stream. Though it usually doesn't vary, it could vary... for example, one could imagine some kind of "smart" encoder that produces I-frames whenever it detects a large change in the scene, rather than wait until the next scheduled I-frame.

jogu commented 7 years ago

Yes, that makes sense. I think the documentation should have a fairly wide 'If you use direct recording of RTSP streams, the recordings are very likely to start earlier and end later than the configured parameters would suggest - this is dependent on the camera."

(and yes, that's why there's a delay. It's really annoying and one of the reasons I like to view cameras through motion instead of directly when I want to quickly pull them up :-) )

iskunk commented 7 years ago

That sounds good. If we get to doing overlaid timestamps/boxes/etc., we could consider adding a "Motion begins in 3... 2... 1..." sort of countdown at the beginning.

I think I've got enough (from here and #374) to get started. Let me hack on this for a bit, and I hope to have some goodies for y'all soon :-)

jogu commented 7 years ago

Awesome, thanks @iskunk, looking forward to seeing what you come up with!

iskunk commented 7 years ago

All right, some preliminary stuff: Figuring out how AVPacket (raw stream data) and AVFrame (decoded frame) line up.

This patch against netcam_rtsp.c adds some strategic fprintf() calls that describe the incoming packets from av_read_frame() and the decoded frames from avcodec_receive_frame(). You can see which packets have the "key" (keyframe) flag set, and exactly what type of frame the decoded frame is.

Here is the output I got from a 100-second run against a single RTSP stream. (FYI, the scene is a completely static view of my living room.) Some observations:

There are two streams here, identified by index=0 and index=1. The first one is the video stream; the second one I'm pretty sure is the audio track. Obviously we'll want to hang on to both.
All audio packets have the "key" flag set, presumably since the audio doesn't have any sort of inter-frame compression going on like video does (i.e. you can decode each audio packet in isolation, without reference to any others)
Video packets with the "key" flag are sometimes very big (like 100kB) and sometimes small (<10kB) and sometimes microscopic (11 bytes). This is a mystery to me, especially how the very first I-frame (plus audio) is transmitted in less that 10kB. (Remember, static scene.) Why the variability?
There is, apparently, neither a consistent 1:1 nor many-to-one mapping of packets to frames. It's all over the place.
Dismayingly, there does not appear to be a consistent relationship between "key" packets and actual I-frames (see line 54 in the log, for example).

I think we might actually be using the FFmpeg API incorrectly here. avcodec_receive_frame() is sometimes expected to return AVERROR(EAGAIN) when it wants more packets before it can spit out a frame, but we're not handling that case---we're treating it as an error.

Is there some subtlety I'm missing here, or is that really a bug?

hyperbolic2346 commented 7 years ago

I wouldn't be surprised at all if netcam_rtsp.c is naive in the way it handles the stream. I wouldn't expect it to be very robust yet. I would suggest that you make any changes that you feel we need or find useful as you go through it.

On Fri, Apr 21, 2017 at 3:42 PM Daniel Richard G. notifications@github.com wrote:

All right, some preliminary stuff: Figuring out how AVPacket (raw stream data) and AVFrame (decoded frame) line up.

This patch https://github.com/Motion-Project/motion/files/948101/motion-packet-frame-logging.patch.txt against netcam_rtsp.c adds some strategic fprintf() calls that describe the incoming packets from av_read_frame() and the decoded frames from avcodec_receive_frame(). You can see which packets have the "key" (keyframe) flag set, and exactly what type of frame the decoded frame is.

Here is the output I got https://pastebin.com/UCGMkLfV from a 100-second run against a single RTSP stream. (FYI, the scene is a completely static view of my living room.) Some observations:

There are two streams here, identified by index=0 and index=1. The first one is the video stream; the second one I'm pretty sure is the audio track. Obviously we'll want to hang on to both.

All audio packets have the "key" flag set, presumably since the audio doesn't have any sort of inter-frame compression going on like video does (i.e. you can decode each audio packet in isolation, without reference to any others)

Video packets with the "key" flag are sometimes very big (like 100kB) and sometimes small (<10kB) and sometimes microscopic (11 bytes). This is a mystery to me, especially how the very first I-frame (plus audio) is transmitted in less that 10kB. (Remember, static scene.) Why the variability?

There is, apparently, neither a consistent 1:1 nor many-to-one mapping of packets to frames. It's all over the place.

Dismayingly, there does not appear to be a consistent relationship between "key" packets and actual I-frames (see line 54 in the log, for example).

I think we might actually be using the FFmpeg API incorrectly here. avcodec_receive_frame() is sometimes expected to return AVERROR(EAGAIN) https://ffmpeg.org/doxygen/3.2/group__lavc__decoding.html#ga11e6542c4e66d3028668788a1a74217c when it wants more packets before it can spit out a frame, but we're not handling that case---we're treating it as an error.

Is there some subtlety I'm missing here, or is that really a bug?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Motion-Project/motion/issues/124#issuecomment-296288851, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWAhPlQwv_qDN4HZ-GhiMM6DD7RjjGhks5ryQa5gaJpZM4JfmH2 .

jogu commented 7 years ago

I tried the latest build (I've been running older code for a while) and got this error several times on startup and once overnight:

[ERR] [NET] [Apr 22 00:50:40] rtsp_decode_video: Error receiving packet: Resource temporarily unavailable

so I'd say there's pretty good chance you're right :-) A merge request that addressed that would be very welcome.

jogu commented 7 years ago

I believe you're correct that the other stream is audio - and correct that it doesn't have that same concept of IPB frames, I believe each audio packet is completely independent.

For figuring out what's going on, you may find that recording to disk with ffmpeg (eg. using the command given in https://github.com/Motion-Project/motion/issues/124#issuecomment-295449444 )

and then running:

ffprobe -show_frames <input mkv file>

will help you understand what's going on / give you something to compare the correctness of the motion code against.

Or you can even run it directly against the camera:

ffprobe -show_frames -rtsp_transport tcp -i <rtsp url>

Also, just as it hasn't been mentioned yet here or in the discussion on #374 - for the time-lapse videos motion records we will still need to decode then reencode (as any other option will result in significantly larger videos).

Audio recording is also something that varies in it's legality a lot across the world; we should probably default to dropping audio packets, but have an config file option that allows the user to enable it. (In the UK as I understand it, in a home setting, it's legally okay if the occupants agree and there's signage that informs visitors, and it can be highly encouraged in some commercial situations where there is risk. I think MrDave has previously been worried about the complexities, but I think in the architecture we're talking about there's really no added complexity to record audio.)

iskunk commented 7 years ago

@hyperbolic2346: I've certainly found a lot of issues there already, like crash bugs :-] Yes, I'll smooth out any wrinkles I find. This is getting easier now that I'm getting a feel for it.

@jogu: Yep, I was getting those log entries too, and now we know why! I'll prepare a pull request. It'll be good to get the current functionality in tip-top shape before starting something new.

That particular issue wasn't too bad, because even though an error was printed, the code still did the right thing (feed more packets to the codec). I did find a new issue that was sort of the mirror-image: the one-packet-yields-multiple-frames scenario was not handled at all. Though this wasn't so bad either, since my camera (and presumably most cameras) don't do this. The pull request will fix this one too.

Since @Mr-Dave isn't going to be with us for a while, could you begin to tackle #368? My Git-fu may not be the greatest, but it's easier to work when you're not stepping on old patches...

I tried the second ffprobe command. Looks good (and boy does my camera generate a lot of errors and bad cseqs), though the format is a bit hard to read. This is a useful tool, thanks!

So I found out what was up with the weirdness in yesterday's log file: Motion was actually running with two RTSP cameras, contrary to what I believed. The second camera was actually from a DVR with no camera connected, so it was giving a minimalistic "no signal" stream with teeny-tiny packets.

I fixed the config, re-ran Motion, and generated a much more sensible-looking log file. I-frames are ~100kB, P-frames are ~5kB, and every "key" packet corresponds to an I-frame.

An open question is how the audio packets line up with the video. Do they describe audio for the preceding frames, or for the following frames? I thought the .pts fields (Presentation TimeStamp) would help clear things up, but the values of these for video vs. audio don't correspond at all. (They increase monotonically, like you'd expect, but they are at two completely different offsets.) Even ffprobe doesn't tell much in this way. Maybe the FFmpeg segment muxer code will have an answer.

Re time-lapse video: You don't suppose it would be feasible to just spit out selected I-frames at the configured interval...?

Re audio legality: Yes, I've heard of this. This won't be a problem, of course. It'll just be easier to add the filtering-out of audio packets later, than adding their inclusion later.

jogu commented 7 years ago

boy does my camera generate a lot of errors and bad cseqs

My camera (a reolink RLC-410) does the same. It /may/ be a bug in ffmpeg where it doesn't perform packet reordering (though why that would be necessary on a TCP connection is unclear to me). Several projects allegedly solved similar errors by switching from ffmpeg to http://www.live555.com for RTSP. If it is an ffmpeg bug I can't imagine it'd be that hard to solve...

Re time-lapse video: You don't suppose it would be feasible to just spit out selected I-frames at the configured interval...?

I'm sure it would work, but as we'd end up with a video made entirely of i-frames my gut feeling is that it'll be at least 3 times larger than the current way, without any noticeable gain in quality or cpu usage (there'd likely be small gains in both, but maybe nothing you'd notice IMHO given we only encode a frame once a second at most). It might be worth a try.

On audio: yes, I saw the same with the packet timestamps not seeming aligned to the video ones. It seems to basically to work with the segment muxer so hopefully there's a solution in there somewhere.

I'll merge #368 if it seems to do okay in my testing.

tosiara commented 7 years ago

Just FYI, the error EAGAIN has been seen to cause segfault: https://github.com/Motion-Project/motion/issues/345, but was fixed in @Mr-DaveDev branch https://github.com/Mr-DaveDev/motion/tree/misc and merged a month ago https://github.com/Motion-Project/motion/pull/352

iskunk commented 7 years ago

Pull request #377 is in.

@jogu: I do see a lot of "concealing DC/AC/MV errors" in the log too, so it could possibly be a dodgy encoder on the camera as well. If there's an FFmpeg issue, I agree that they should be able to address it within their architecture...

Re time-lapse I-frames: Yes, the resulting videos would not be quite so compressed. An alternate possibility is an option to only use I-frames as input to the encoder, instead of the exact frame (likely a P-frame) at the interval mark. That way, you avoid any P-frame encoding artifacts getting compounded by another round of encoding. If your interval is e.g. 1 minute or longer, than the +/-1 second or so adjustment necessary to get the I-frame should be no big deal.

I saw you merged #368, thanks :-)

@tosiara: AVERROR(EAGAIN) is a documented part of the FFmpeg API, so Motion segfaults are unlikely to be due to that. (In all honesty, the RTSP code had a lot of issues, and I haven't even gotten in all my fixes yet...)

New stuff: For those interested, here's what I've put together so far. (This patch is very very raw; don't bother trying to apply it, let alone compile it)

There is a two-level linked-list structure to hang onto the packets (a list of GOPs, and each has a list of packets) in struct ffmpeg, and then a new function ffmpeg_save_packet() that adds new packets appropriately, with GOP breaks as detected. (No idea if this does the right thing for audio yet.)

Now I'm trying to figure out how to integrate this with Motion's regular processing, as well as the event-movie logic. At least these two bits are needed:

Expire old GOPs (and the packets therein) when their end-timestamp is older than pre_capture frames ago, but in a way that is synced with the detection logic as it finishes analyzing frames
Write out video, including the pre_capture period, when the event occurs

Comments and advice are welcome.

iskunk commented 7 years ago

Still trying to figure out the integration. One of the things I really dislike about this code is how it sometimes passes images around as straight arrays of char, with no metadata. In particular, the handoff of images from netcam->latest to cnt->current_image is like this (see netcam_next_rtsp()) and there's no clean way to send additional data (like the associated GOP) along with that.

(Then there's how when Motion does use a struct to store the image, it sometimes uses struct image_data, and sometimes struct netcam_image_buff...)

Here's the strategy I've got so far:

Add a pointer to struct image_data referencing the GOP that this image is a part of
Expire GOPs in a call from mlp_resetimages(), as that appears to be where old frames get dropped from the ring buffer, using those GOP pointers
Change event_image_detect(), event_extpipe_put() and event_ffmpeg_put() to take a struct image_data instead of just the raw image data, so we can get the GOP pointer for the image. (All three need changing because they all correspond to EVENT_IMAGE_DETECTED)
Make event_ffmpeg_put() spit out the entire GOP for a frame when in passthru mode. For subsequent frames, either don't write out the same GOP again, or write out the remainder of the GOP if it's not yet complete.
Not even thinking about thread safety yet, obviously :-]

I could sure use some guidance from those more familiar with this code. The architecture of the whole thing is really showing its age, and there's probably going to be some ugliness however this is done. Just need to find the least-ugly way...

hyperbolic2346 commented 7 years ago

I ran into the same kind of issues when I was looking at using a lower resolution buffer for motion detection, but writing out the full-size buffer as the video. I pretty much gave up as without some major architectural changes it just wasn't going to work well. I don't feel that I am in a position to make such architectural decisions either. On Mon, Apr 24, 2017 at 6:57 PM Daniel Richard G. notifications@github.com wrote:

Still trying to figure out the integration. One of the things I really dislike about this code is how it sometimes passes images around as straight arrays of char, with no metadata. In particular, the handoff of images from netcam->latest to cnt->current_image is like this (see netcam_next_rtsp()) and there's no clean way to send additional data (like the associated GOP) along with that.

(Then there's how when Motion does use a struct to store the image, it sometimes uses struct image_data, and sometimes struct netcam_image_buff ...)

Here's the strategy I've got so far:

Add a pointer to struct image_data referencing the GOP that this image is a part of

Expire GOPs in a call from mlp_resetimages(), as that appears to be where old frames get dropped from the ring buffer, using those GOP pointers

Change event_image_detect(), event_extpipe_put() and event_ffmpeg_put() to take a struct image_data instead of just the raw image data, so we can get the GOP pointer for the image. (All three need changing because they all correspond to EVENT_IMAGE_DETECTED)

Make event_ffmpeg_put() spit out the entire GOP for a frame when in passthru mode. For subsequent frames, either don't write out the same GOP again, or write out the remainder of the GOP if it's not yet complete.

Not even thinking about thread safety yet, obviously :-]

I could sure use some guidance from those more familiar with this code. The architecture of the whole thing is really showing its age, and there's probably going to be some ugliness however this is done. Just need to find the least-ugly way...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Motion-Project/motion/issues/124#issuecomment-296844977, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWAhAMcpWFuvKlOqkdTez4Uv18KY3J1ks5rzSjkgaJpZM4JfmH2 .

iskunk commented 7 years ago

At worst, it should be possible to hack in new features like this via brute force---in my case, I can do e.g. netcam->cnt->imgs.image_ring[] and pretty much frob anything in the context I want. But of course, it'll be ugly. And it highlights the need for a new architecture all the more.

To everyone here: What if I were to open a new issue for discussion of what a next-gen Motion architecture would be like? I have some ideas for that, but an important preliminary step is coming up with the requirements. We'd want to support not just the current functionality, but also future functionality---so we'd want to think of what cool whiz-bang features we'd want to be able to implement in the future without architectural awkwardness like we're seeing here.

For example, here's one: A stereoscopic (3-D) location detection algorithm, that requires two parallel camera streams to operate. We can't do this right now even if we were able/willing to hack it, because Motion analyzes camera streams independently; there's no provision to deeply connect the analysis of two streams in this way. So now, having thought of that, we can ensure that a new architecture doesn't have the same limitation.

(Oh, and that presumes another requirement: supporting more than one detection algorithm in the first place. Got a bit ahead of myself there :-)

And so on. It's a lot easier to come up with a future-proof design if you can figure out all the odd corner cases beforehand.

iskunk commented 7 years ago

Okay, I think I've found a way to square this circle. Here is the WIP. (As before; compile no, review yes)

Items of note:

New data structure struct packet_buff that holds an expandable list of AVPackets
Functions for struct packet_buff to add/move/clear packets and free the list memory
A struct packet_buff is added to struct netcam_buff, struct image_data, and struct context.
In the netcam RTSP code, packets are saved into the netcam_buff alongside the decoded frame image. There is also a flag to indicate whether this image represents a keyframe.
In netcam_next_rtsp(), the packets and keyframe-flag are handed off into the appropriate slot in the image ring buffer (this is where things get ugly, because we can't use the char* image buffer that was passed to us for this)
The packets then stay in the ring buffer for a while, until the associated images are dropped in mlp_resetimages(). Then, we move the packets into a GOP buffer in struct context.
The GOP buffer only contains packets starting from the last I-frame received, to the oldest slot in the ring buffer. (It's really an "extra packets needed to fill out the first GOP" buffer.) Any time you get a new I-frame, the packets are cleared/freed from this buffer (so the first packet is always a video "key" packet, and there is never more than one video "key" packet).
When you go to write out a movie for an event, you first write out all the GOP-buffer packets, then the packets in each successive frame that is sent in for writing. (No provision yet to end on an I-frame... that may or may not be worth the candle)
No provision yet for setting up the output format based on the detected input format, only that passthru mode would be requested by setting the codec to "passthru".

A side effect of this approach is that we are separating the packets into individual frames, so the netcam_buffs and image_datas will each have maybe 2-3 packets at most. It's important to note that splitting the packets in this way is not very sensible/useful in itself; the reason for doing it here is so that the set of packets that are being stored in general lines up tightly with the image ring buffer. (My initial approach was to have a separate buffer of GOPs/packets, which was going to need to be kept in sync with the ring buffer.)

iskunk commented 7 years ago

I think I've found what @Mr-Dave meant when he said

We also can not guarantee that we are always grabbing the frames at the exact same rate that they are being provided by the camera.

What happens is that the netcam code updates netcam->latest at the rate the frames come in from the camera, but the main loop (via mlp_capture()) grabs frames (from netcam->latest) at its own independent rate. So if the camera stream starts lagging, the main loop starts re-capturing the same frame over and over. And that's not going to work for passthru, because each frame has to be written out only once, as it is in the source stream.

What I did to resolve this was add a pthread_cond_wait(&netcam->pic_ready, &netcam->mutex) in netcam_next_rtsp(), so that the main loop actually waits for the next frame to come in. (Curiously, this pic_ready condition variable, which came in so handy for this purpose, was not waited on anywhere in Motion until I added this bit. This should probably really be a pthread_cond_timedwait() so it doesn't wait forever, and we'll need to figure out exactly what should happen when the camera stream lags too much. But for now, provided the lag stays under control, this seems to do the trick.

I have the code compiling, running, and actually not crashing. It's writing out movie files, but they're currently not playable (i.e. they cause MPlayer to segfault). Will need to figure out the FFmpeg output settings, as there's probably some mismatch between the packets and the container they're being put into.

jogu commented 7 years ago

Yes - the motion detection thread may be running faster or slower than the stream from the camera.

The change you suggest with a condition variable is good regardless as it prevents the motion loop running when there's nothing it can sensibly do; it may need a few tweaks to make sure motion can still reliably & cleanly shutdown.

We need to be careful not to bake in any assumptions that the motion detection thread will process every frame from the camera - on many people's setups, the camera will be streaming data faster than the server can process it, ie. the motion detection might only process every other frame. (This is why in an earlier comment I'd suggest that the motion detection explicitly tells the rtsp code when it's done with a frame, as that would be point we can then discard all earlier frames - except for any we need to keep for 'pre-record'.)

jogu commented 7 years ago

See https://github.com/Motion-Project/motion/issues/225 for discussion about pic_ready btw (which agrees with what you've suggested).

iskunk commented 7 years ago

Re condition variable: That's what I was thinking---aside from the cleanup/shutdown stuff, what's the motion loop going to do when there's no new frame?

I don't think there should be an issue with motion detection analyzing every frame, as the existing logic that (un)sets cnt->process_thisframe should continue to work. You can't drop frames from the recorded video, but you already said that was a given anyway :-)

Re discarding earlier frames and pre-record: Because what is written out to a movie---including pre-record---is basically everything in the ring buffer (plus additional frames as the motion continues), the approach that I found easiest was just to store the packets with their associated frames in the ring. (Note that the only place where the pre_capture variable is used is in sizing the ring buffer, so the ring size is meaningful.) The only extra piece you need then are the packets needed to complete the first GOP.

(Which is not to say that this is a great approach in itself; just that it plays well with the existing architecture. The idea I had at first, about storing GOPs explicitly, would have needed a lot of glue code and just wasn't the way to go here.)

Thanks for pointing me to #225; that basically nails the question I had... though I wish the answer were better than "waiting for a notification doesn't seem right." Probably has to do with the shutdown issues you mentioned, i.e. the motion loop not being responsive and in control.

If anything, it would be possible to do this without a hard condition-variable wait... just have the motion loop keep spinning (skipping over all the image stuff) and trying vid_next() repeatedly without blocking. That would require more changes, however. And who knows, maybe the netcam loop will need a wait, too, in case the motion loop isn't grabbing the frames quickly enough...

Current status: Very, very close. The right packets are getting written out, but the resulting movie file still crashes MPlayer. I'm following the FFmpeg remuxing.c example, which is basically exactly what we're doing with passthru, and hoping the working-code vibes eventually transfer over...

iskunk commented 7 years ago

FINALLY. The thing is writing out viewable, only slightly garbage-y videos!

What happened is that event_gap was set to the default of 60, and because my test camera is watching a busy-ish scene, the MP4 file was never getting the trailer written to it---which apparently is kind of a requirement for MPlayer not to crash ^_^

Now, to figure out what's going on with the slight garbage-y-ness, and make sure the right thing happens GOP-wise for videos with gaps...

jogu commented 7 years ago

That sounds awesome, nice one @iskunk!

danboid commented 7 years ago

Hi all

I just got my AUSDOM 1080p USB webcam (for less than £20, new!) today and I want an app (for Linux and FreeBSD, ideally) that will detect motion and then start recording the raw MJPEG stream from my webcam without reencoding. I do not want any timestamps or motion boxes - this is what is being worked on here, right?

How close are we to seeing this feature in motion? Is there not another, existing open source app that can do this already?

Finally, is there a way to disable ALL text overlays in motions video output files? I have commented out all of the text related options in my motion.conf file but still motion is adding a timestamp to the bottom right corner of every video it outputs.

Thanks

jogu commented 7 years ago

Hi @danboid. The development here is (for now at least) really only about rtsp webcams that output h.264/mp4, so not applicable to your situation. You're best off asking your other questions on the motion mailing list please - https://lists.sourceforge.net/lists/listinfo/motion-user

Motion-Project / motion

Pass-through recording #124