Closed garethsb closed 2 years ago
Our advise is to mainly support the codestream packetization (as it is also done in TR-08). Slice-based packetization (in order or not) was defined for very specific use cases that are probably not going to be used in the context of AMWA NMOS.
In the activity group call today, we tentatively decided that it is simplest just to introduce Flow attributes and Receiver caps for these two parameters right now from v1.0.
Based on https://www.rfc-editor.org/rfc/rfc9134.html#section-4.3, I wonder if we should have one property rather than two, i.e. something like
"packet_transmission_mode": {
"type": "string",
"enum": [
"codestream_sequential",
"slice_sequential",
"slice_out_of_order"
],
"default": "codestream_sequential"
}
Second question, is whether since these flags are part of the payload header (though not the RTP header...) defined by the RTP Payload Mapping, whether the attribute should be on the Sender, not the Flow? If it would logically make sense to have one encoded Flow transmitted by two different Senders using different packetization, then I think the answer is yes.
If the SDP transport file describes these attributes as two values, doesn't it make it simpler for "target" description to match the SDP? Will it be the first time that a Receiver Capability target multiple SDP attributes at once?
Can a JPEG-XS encoder produce a Flow that could be transmitted in different way by various Senders using various transport using various packetmode and transmode? My understanding is that the packetmode and transmode are tightly coupled to the method an encoder uses to produce a Flow. Similar to having slices and out of order GOP in H.264 and H.265 ... Because of that I would tend to put them as Flow attributes but we are in the blurred region between transport and flow.
Will it be the first time that a Receiver Capability target multiple SDP attributes at once?
Even Media Type requires comparison against two different parts of the SDP file actually! 😄
If the SDP transport file describes these attributes as two values, doesn't it make it simpler for "target" description to match the SDP? Will it be the first time that a Receiver Capability target multiple SDP attributes at once?
It depends. On one hand, it is indeed simpler to map the transmode and packetmode from SDP (defined by RFC 9134) directly into NMOS. However, the issue is that these two boolean parameters (values 0 or 1) allow for an invalid combination. So out of 4 possible configurations, only 3 are valid. Codestream packetization always implies sequentially ordered packets (in-order). While slice packetization allows for in-order and out-of-order packets.
In that sense, @garethsb 's enum/string proposal seems safer/easier/better.
Can a JPEG-XS encoder produce a Flow that could be transmitted in different way by various Senders using various transport using various packetmode and transmode? My understanding is that the packetmode and transmode are tightly coupled to the method an encoder uses to produce a Flow. Similar to having slices and out of order GOP in H.264 and H.265 ... Because of that I would tend to put them as Flow attributes but we are in the blurred region between transport and flow.
This question can be answered both ways :) In essence, the RTP packetization of XS codestreams can always be done in any valid way (slice vs codestream, and slice-sequential or slice-out-of-order). So, it is possible to build a system that takes pre-encoded XS codestreams and then packetizes these in multiple ways to create different flows in parallel (going to different receivers).
However, the idea and use of a slice-based packetization mode is to allow parallelization in the XS encoder itself, so that it can generate encoded content on the slice boundaries of each video frame and send it out as soon as it is ready. This is in contrast to a heavily parallel XS encoder that needs to buffer encoded content (parts of the image) and then organize everything into a well structured XS codestream before it can send it out.
And, indeed, sending slices out of order makes more sense when the XS encoder is also configured to disable some dependencies between the slices (so that slices are independent of each other) (doable by things such as disabling vertical prediction and/or using column mode).
So, what I'm saying is that slice-based packetization makes mostly sense if it can be integrated with the actual XS encoding process. This is useful to further decrease the latency. But, it shifts complexity to the decoder (which now has to puzzle all received slices back together). But, your use case described in the question is certainly possible.
Thanks, Tim, that very nicely explained what I thought. :-)
So, it seems we are in a grey area here between considering this/these properties to be of the Flow/format/encoder or the Sender/transport/packetizer. Putting the attribute(s) on the Sender seems theoretically right, but practically the encoder will almost always be involved so putting the attribute(s) on the Flow would represent that.
A similar decision is pending for audio. Since the same audio can be packetized differently in different streams, we think packet_time
and max_packet_time
logically belong on the Sender, but in common media processing frameworks this grouping of samples into packets may be done separately before RTP packetization. In this case, we have given the Parameter Constraints URNs like urn:x-nmos:cap:transport:packet_time
, and the indication of transport
would align well with a Sender attribute therefore.
In brief, I want to propose urn:x-nmos:cap:transport:
for the packetization/transmission mode, and a Sender attribute, even though in practice the encoder is responsible for the implementation.
I think we'll have to define more precisely what is Transport and what is Flow in order to operate in this grey area in a coherent way in the future ... For example if the criterion is "Can a byte stream described as a Flow be transmitted through various Transport unaltered?" would it be able to identify what is a Flow and what is Transport. Unfortunately NO because the previous example of audio and JPEG-XS are not identifying packet_time and packet_transmission_mode as transport concepts using this criterion :( What would be such criterion?
I have the same goal and criteria, but I am confused by your conclusion... I think the audio Flow or JPEG XS codestream can be reconstructed later after going through transport, because the packetization IS a transport issue in both cases?
Maybe I'm wrong but it seems to me that the concepts of packetmode and transmode are part of the ISO specification of JPEG-XS and not specific to the RTP Payload format specification (RFC). As such a byte stream produced by a JPEG-XS encoder following the ISO specification, without considering the RTP Payload specification, could be transmitted using various transports and then be described by the criterion as a Flow ...
Maybe I'm wrong but it seems to me that the concepts of packetmode and transmode are part of the ISO specification of JPEG-XS and not specific to the RTP Payload format specification (RFC). As such a byte stream produced by a JPEG-XS encoder following the ISO specification, without considering the RTP Payload specification, could be transmitted using various transports and then be described by the criterion as a Flow ...
This is not correct. The ISO specification does not define in any way the packetmode or transmode concepts. This is all defined in RFC 9134. The RFC offers two packetzation modes: codestream and slice.
However, slices usually depend on their respective previous slices (due to a prediction mechanism). Thus, if you want to use slice packetization of RFC 9134, typically the packets still need to arrive in order at the receiver. Otherwise, the receiver needs to buffer the slices for which it did not yet receive the previous ones. Moreover, the encoder also has to produce the slices in order (again to do the prediction). Thus, the default slice packetization is still sequential (transmode).
If however the encoder disables the slice prediction dependency, then it can encode in parallel all slices of an image/videoframe and push them out whenever ready. In this scenario, a decoder can receive and immediately decode the slices, without having to wait on the "previous" ones. For this reason, the out-of-order transmode exists.
But, in summary: transmode and packetmode are RFC 9134 specific and these concepts are never mentioned in the ISO XS standards.
Thanks for the detailed explanation. One more question: Would it be true then that independently on how a JPEG-XS encoder produces a byte stream, the transport layer could select either codestream or slice packetization? I would expect the answer to be true if the packetization mode is purely a concept of RTP Payload format for JPEG-XS. I think it could be true in theory because at the transport level could reorder the slices but in practice I still see that an encoder is aware of the packetization mode because it selects its behavior regarding slices base on that information.
true in theory because at the transport level could reorder the slices but in practice I still see that an encoder is aware of the packetization mode
That's my understanding
Yes, that’s correct.
Resolved by #14.
The
packetmode
andtransmode
format-specific parameters for JPEG XS are defined in RFC 9134.packetmode
(K=0 codestream, K=1 slice)transmode
(T=0 any order, default T=1 sequential order)These parameters could affect whether a particular Receiver can handle a given stream.
Packetmode and transmode have dependencies: Default packetmode is 0 (codestream) and this packetmode implies sequential transport mode (transmode is 1). If packetmode is 1 (slice), then transmode can be either 0 (any order) or 1 (sequential, in order).
However, the default (codestream/sequential) is appropriate for most use cases, and VSF TR-08 currently only supports this default.
Therefore, it could be appropriate to hold off adding these as Flow (or Sender?) attributes and Receiver parameter constraints and assume the default in both cases? In that case, do we want to be explicit about this in BCP-006-01 so that we have a path to adding these in the future?