Connection to DASH streams is slow

lukasf commented 4 years ago

Adding this here as a discussion point. There are a few things that come together, leading to the very slow connection time for DASH streams.

1. AVFormatContext initialization:

When creating the AVFormatContext, FFmpeg wants to fully initialize all streams contained within, including detailed codec parameters. At least one complete stream header must be parsed for each video- and audio stream.

The top-level MPD header already contains basic information about all the streams. But it is not enough to fully initialize the FFmpeg streams.

There is no way to change the initialization mechanism. It would be great to have some "delay-init" feature. Then you could start with only a basic stream description, and then call a method to do full init once it is needed.

Also our MediaStreamSource needs some video parameters to be set on startup. Theoretically, it might be possible to set/change these parameters also during playback. But we never tried that, so I am not sure if it would really work.

2. dashdec video stream initialization 1:

Each video substream usually starts with an init-segment, and after that come the actual video data segments. The way how dashdec does the initialization is not optimized for that: For probing the stream format, it will read a packet of a fixed size, and that usually results in reading the init-segment plus parts of the first data segment. So two http requests are done, although I think the init-segment should contain everything that is needed to setup the stream.

Yesterday evening I managed to change the behavior of dashdec, and have it do the probing on the init-segment alone. For the stream I used, it would cut the initialization time from 6.5s to 3.5s. It might not work for all streams (depends on how the stream is actually setup), but it could potentially cut init times almost in half. I need to test this a bit more with different streams, then I can provide a test build here @mediabuff.

Even if it works, we'd have to get this into the official FFmpeg repo. Not sure how difficult that would be.

3. dashdec substream initialization 2:

The other problem with dashdec is that it will initialize all video streams sequentially. This is of course a major problem, since it will make connection times slower and slower, the more video streams are contained in a source. Theoretically this could be done in parallel. But multi-threading in C is rather painful, even more in a cross-platform lib such as FFmpeg where you don't even have basic synchronization mechanisms such as semaphore. I might be able to hack together a solution, but chances are high that it would not be accepted.

If we had solutions for both dashdec problems in place, we could get near the connection times that browsers have (maybe plus 0.5-1s).

brabebhin commented 4 years ago

I wonder if it were possible to use a custom IO context to optimize things. We already have one for abstracting the FileRandomAccessStream async as synchronous.

lukasf commented 4 years ago

Updated top post. Custom IO won't work here, because for DASH there are actually multiple connections open at the same time, one for each stream.

brabebhin commented 4 years ago

hmm. Another idea I just had...

what if we do the stream multiplexing ourselves instead? We would create the mss with the first stream, and then attach subsequent streams as we go along.

I am aware this would require quite the effort on our side.

lukasf commented 4 years ago

I am not sure if we can easily change the media type of our video stream. The AudioStreamDescriptor.EncodingProperties docs say that parameters can be changed at any time, but VideoStreamDescriptor.EncodingProperties do not say so, and we never actually tried to change the media type during playback of either audio or video. Also, proper parsing of DASH is not a trivial task. I've already spent quite a lot of effort on this, even though I do not really use it myself.

The changes I done to dashdec did not work out as expected, because then FFmpeg would at a later point still request data from each stream. The speed improvements I saw must have been coincidence (server faster than usual). I am trying with a second patch for mov (mp4) parser to avoid that second pitfall, but still fighting with that one. The more complex this gets, the lower will be the chances to actually get this merged into FFmpeg. But we'll see. Doing stream initialization in parallel is also possible, maybe I should have started with that one first.

brabebhin commented 4 years ago

What if we try a different approach: instead of trying to do the ffmpeg do dash properly, maybe we should use the existing AdaptiveMediaSource class (which is designed for this), and insert media transforms in the media pipeline to support custom decoding.

lukasf commented 4 years ago

Is there a way to insert custom transforms? I only know the audio and video effects API. These effects always assume input type == output type, so I think they cannot be used for decoding.

brabebhin commented 4 years ago

According to the great codex of media foundation, media transform can be used for decoding https://docs.microsoft.com/en-us/windows/win32/medfound/media-foundation-transforms

This seems to be the only way to insert custom decoding in the adaptive source. I am also pretty sure this is how the media extensions from the store work.

All we need is to figure out how to register them inside the app. Thought I'm sure it's possible.

After 667 edits, here's the example

https://docs.microsoft.com/en-us/cpp/cppcx/wrl/walkthrough-creating-a-windows-store-app-using-wrl-and-media-foundation?view=vs-2019

brabebhin commented 4 years ago

https://docs.microsoft.com/en-us/uwp/api/windows.media.mediaextensionmanager

And this is how we register. Took some doing to find it...

I think we can implement some fancy stuff if we go lower level, maybe we can have our own gapless playback implementation.

I think we cannot override the system codecs but that shouldn't be an issue anyway, we pass throughout most of these anyway. The only thing we would be missing is the custom implementation of audio effects. We could move these to IBasicAudioEffect, but at this point I am kind of scared given the history of playback list and video effects.

I think that going down this MF way would eventually end up in having our own custom pipeline, which is not that bad of an idea.

lukasf commented 4 years ago

Oh right. Lol yeah I actually used that API in an older audio app to get FLAC, Ogg and ALAC playback, back in the old days of Windows 8. Totally forgot that it's there. But I also have to say that it was quite a struggle to get this working properly. It is a whole lot more complex than MSS.

One of the bigger problems I had was that there is no way to get the pipeline do buffering. On slower devices or WP, this caused issues now and then. I was fighting with trying to implement some kind of async read-ahead buffer, but I failed to get it stable and deadlock-free. But that's long ago, we should probably be able to get that right today.

I think it was possible to override even system codecs with that API.

I still don't know if this is a good idea. It would definitely mean a whole lot of work, only for one single use case.

Creating our own pipeline with that would be possible, but we would also lose a lot of stuff, like file position/progress, stream selection, subtitles,...

brabebhin commented 4 years ago

Our objective is to plug the ffmpeg decoder into the adaptive media stream source. This should handle buffer async stuff by itself. At least i hope it works so.

I still think going down this route is thr better approach because the adaptive stream source should also do stream switching on its own.

lukasf commented 4 years ago

Just saying, if you'd want to use that as a general approach for a media engine, you'd have to come up with some kind of buffering yourself. Because the normal MediaPlayer won't do that for you, unless you use MSS or AMSS.

Right now I am not up for the task. DASH is not my main use case. The amount of work required to get this done properly is not really justified for me.

Also, DASH is only specified to use either HEVC or H264. For both formats, there are free extensions available in the store for SW decoding. So I do not even see the need to use our library for this. You'd still have to use the AdaptiveMediaStreamSource for DASH, so it's not like you could use the same API with DASH and with normal streams or files, even if we implemented this.

brabebhin commented 4 years ago

Yes. You are right about that last point. I was just thinking how this all could work with our current setup. We would have to return either AMSS or MSS depending on URL. It gets messy quickly.

I think support for multiple video tracks is as far as we should go. Trying to do DASH would be impossible from our perspective. Even if we do get the whole decoder working we still have no way to get performance metrics to trigger a stream switch.

If Microsoft allows creation of an AMSS similar to MSS, then it would be worth pursuing. It would probably be better to just migrate to AMSS in that case. But it seems this is not the case any time soon.

pashmak73 commented 4 years ago

Hi, I've tested this new feature with Instagram Live. it seems in some point the streaming part will stops and no more data will receive, Instagram uses https but since ffmpeg doesn't support HTTPS I've removed "S" . I've tried VLC, VLC can plays it but it has a lot delays more than FFmpegInteropX.

ffmpeginteropx / FFmpegInteropX