Open murillo128 opened 6 months ago
@drkron, any thoughts? I don't have much WebRTC experience.
pinging the usual suspects @fippo @henbos @alvestrand @jan-ivar @aboba
In practice, captureTime is only available via RVFC for locally captured frames. Are you looking to obtain it on the remote peer as well?
captureTime is already supported for remote webrtc sources
captureTime, of type DOMHighResTimeStamp
For video frames coming from a local source,
this is the time at which the frame was captured by
the camera. For video frames coming from remote
source, the capture time is based on the RTP
timestamp of the frame and estimated using clock
synchronization. This is best effort and can use
methods like using RTCP SR as specified in RFC
3550 Section 6.4.1, or by other alternative means if
use by RTCP SR isn’t feasible.
However relying on RTCP RR or rtrr doesn't provide insightfull information on an SFU scenario. Using the abs-capture-time value would be thr best in this case.
When it comes to the "remote" part of captureTime
, the current definition of it is very difficult to utilize in practice:
a) RFC 3550 Section 6.4.1 provides the sender with RTT estimations but what we need is RTT estimations at the receiver. This means that the receiver must either also send its own RTCP Sender Report
or the receiver must send an RTCP Extended Report
with a Receiver Reference Time Report Block
and getting a DLRR Report Block
back (see RFC 3611).
Note that even if the receiver does send its own SR
, it may still not be sufficient. WebRTC is (if I remember correctly) implemented to always put the Delay Since Last SR
response into a separate RTCP Receiver Report
even if the receiver is sending media. This leads us to the awkward situation where the receiver has to "cheat" and use RTT estimations and NTP timestamps from a completely different set of RTCP reports (i.e. from completely different SSRCs) than the ones involved with each video frame in VideoFrameCallbackMetadata
.
b) As @murillo128 mentioned above, RFC 3550 Section 6.4.1 and its derivatives are unable to "look beyond" RTCP-terminating mixers.
I believe that it would be more useful to redefine captureTime
so that it's always based on timestamps from capture system's reference clock rather than having to be re-synced to the "local" system's reference clock. This would leave things as-is for the "local" case while allowing abs-capture-time
(and possibly "timestamps baked into video frame headers") to be used for the "remote" case.
For example, changing the text from:
For video frames coming from a local source, this is the time at which the frame was captured by the camera. For video frames coming from remote source, the capture time is based on the RTP timestamp of the frame and estimated using clock synchronization. This is best effort and can use methods like using RTCP SR as specified in RFC 3550 Section 6.4.1, or by other alternative means if use by RTCP SR isn't feasible.
To say something along the lines of:
For video frames coming from a local source, this is the time at which the frame was captured by the camera. For video frames coming from a remote source, this is timestamp set by the system that originally captured the frame and with its reference clock being the capture system's NTP clock (same clock used to generate NTP timestamps for RTCP sender reports on that system).
In an ideal world, VideoFrameCallbackMetadata
would have a full set of properties for the "remote" case:
1) Capture timestamp from the original capture system's reference clock. This is what's proposed here.
2) Estimated clock offset between the original capture system's reference clock and the local system's reference clock. This lets us calculate the one-way delay when combined with (1).
3) CSRC or SSRC associated with (1) and (2). Knowing timestamps, but not knowing from where they are coming from, is problematic when mixers are involved.
This is basically RTCRtpContributingSource
but on a per-frame basis:
The neat thing (when it works) with the current definition is that all timesstamps are using the same reference and can be compared to performance.now(). This makes it very simple to calculate glass-to-glass delay, receive-to-render delay, etc.
I would suggest that absoluteCaptureTime is added next to the capture timestamp. This timestamp would then be the unaltered capture timestamp in the sender's NTP clock.
Current approach to use the RTCP SR synchronization timestamp for the captureTime has two flaws:
In webrtc we already have a working solution which will allow to support it on both cases:
https://w3c.github.io/webrtc-extensions/#dom-rtcrtpcontributingsource-capturetimestamp
Should we include that if the abs-capture-time header extension is available we should use it instead of the RTCP SR value?