Open hogliux opened 2 weeks ago
In this case, you can try this flag for st30 session: ST30_TX_FLAG_USER_TIMESTAMP, the user should provide the rtp timestamp (which reflects the sampling time) for each 1ms frame in the next_frame callback with st30_tx_frame_meta.
Hence, logically, the earliest an RTP packet can possibly be sent, is after the last sample inside the RTP packet has been captured, i.e. 1ms after the the time indicated by the RTP timestamp field (I'm assuming a ptime of 1ms here). With the currently faulty RTP timestamp field, a receiver seeing this packet on the wire, will conclude, for example, that the last sample in the RTP packet was captured 1ms in the future, which doesn't make sense.
Great findings, it can be easily fixed by change the function tx_audio_pacing_time_stamp
in https://github.com/OpenVisualCloud/Media-Transport-Library/blob/main/lib/src/st2110/st_tx_audio_session.c#L236
uint64_t tmstamp64 = epochs * pacing->pkt_time_sampling;
to
uint64_t tmstamp64 = (epochs + 1) * pacing->pkt_time_sampling;
This change should fix the synchronization issues for your audio device.
And other enhancement is providing a option to allow user to customize the RTP timestamp offset to the epoch. MTL provide a option in https://github.com/OpenVisualCloud/Media-Transport-Library/blob/main/include/st20_api.h#L1200 for ST20, we can apply a similar approach for audio also.
Thank you @frankdjx and @ricmli for your suggestions. That's very useful to know. I've tested @frankdjx suggested code change and that works. An option to customize the offset to the epoch would be even better of course.
When inspecting the egress timings of RTP packets produced by MTL with wireshark, I see that the RTP timestamp encoded in the RTP message roughly matches the egress time of the RTP packet:
Wireshark is set to show the time in SECS.NANOS since epoch, hence, nanos since epoch of the packet egress time (marked red above) is:
converted to samples (@ 48kHz):
printing the lower 32-bits of that number:
which is exactly the RTP timestamp field in that packet (also marked red above).
However, I believe this to be incorrect:
Section 7.7.2 of the ST 2110-10 standard says:
Hence, logically, the earliest an RTP packet can possibly be sent, is after the last sample inside the RTP packet has been captured, i.e. 1ms after the the time indicated by the RTP timestamp field (I'm assuming a ptime of 1ms here). With the currently faulty RTP timestamp field, a receiver seeing this packet on the wire, will conclude, for example, that the last sample in the RTP packet was captured 1ms in the future, which doesn't make sense.
In deed, on the few hardware and virtual aes67 audio devices that we have access to, the packets is sent out earliest 1ms after what is encoded in the RTP timestamp field.
This issue is causing us synchronization issues with said audio devices.
Note: that this distinction does not matter for video as RTP packets typically only carry one frame (i.e. the first and last frame are identical anyway).