cyanreg / avtransport

A new container/protocol for media transmission over networks
https://cyanreg.github.io/avtransport/
BSD 2-Clause "Simplified" License
26 stars 4 forks source link

Misc comments and proposed changes (part 1) #2

Closed robUx4 closed 2 years ago

robUx4 commented 2 years ago

As discussed, I had a look at the specification and have a few comments and some proposed (tiny) changes.

Implementations are allowed to only compare the first 4 bytes or 8 bytes to identify a Qproto session.

Can the raptor code give different values given the 4 bytes before are identical ? Is it "only 4 or 8" or "only 4 or only 8" bytes ? In the former case you may omit the "or 8 bytes" and explain that in the case of 4 bytes, the raptor code is not checked.

This signals the epoch of the timestamps, in nanoseconds since 00:00:00 UTC on 1 January 1970

In Matroska we use 2001/01/01 00:00:00 so that we are less limited by the UNIX times (and also it's the start of the millennium).

Raptor code to correct and verify the previous contents of the packet.

Given the position of the raptor code is defined by the Descriptor value, do you need to include the Descriptor bits in it ? Isn't it a waste of bits ?

This field MAY be zero, in which case the stream has no real-world time-relative context.

You mean the epoch field or the global_seq ? Given the next sentence I suppose it's the epoch field.

If this field is non-zero, senders SHOULD ensure the epoch value is actual.

"actual" is rather vague. The only coherence check that could stand on its own is that it's not in the future.

In general I don't understand the use of the time_descriptor given there are many cases where the value doesn't matter.

Monotonically incrementing per-packet global sequence number.

Since it's on 32 bits, what happens once one you are at 0xFFFFFFFF ?

If the latter (codec_id, timebase AND related_stream_id) are to change an end of streampacket MUST be sent first.

What happens if the end of streampacket is lost over UDP ?

derived_stream_id doesn't much description on how to use it.

If all bits in the mask 0x50fc are unset, related_stream_id MUST match stream_id, otherwise the stream with a related ID MUST exist.

And be a different ID ? Also the definition based on a mask is odd, because if you add stream_flags value that definition may not be true anymore.

Per-stream monotonically incrementing packet number.

Does it have to start at 0 ? Does it overflow ?

Implementations MUST feed the packets to the decoder in an incrementing order according to the dts field.

By the dts is not in the stream ? It should reference the more generic "coding order" as opposed to the "display order".

The lower 8 bits are used as the pkt_flags field.

This is odd, you may use two b(8) fields .

zlib data compression might be added, it usually gives a good ratio with subtitle content.

Negative offset of the previous index packet, if any, in bytes, relative to the current position.

From start of the Index packet or the beginning of the field ? Bytes excluded ? An example could clarify this.

Must be exact.

MUST be exact. ?

cyanreg commented 2 years ago

Hey, thanks a lot for taking a look.

Can the raptor code give different values given the 4 bytes before are identical ?

No.

Is it "only 4 or 8" or "only 4 or only 8" bytes ? In the former case you may omit the "or 8 bytes" and explain that in the case of 4 bytes, the raptor code is not checked.

I did this for compatibility reasons with lazy parsers who only ever check the first 4 bytes. I'll reword it to make it clearer.

In Matroska we use 2001/01/01 00:00:00 so that we are less limited by the UNIX times (and also it's the start of the millennium).

I'm keeping UNIX time, it's just more direct. 50 years don't really matter when you have ~500 years of absolute range.

Given the position of the raptor code is defined by the Descriptor value, do you need to include the Descriptor bits in it ? Isn't it a waste of bits ?

The descriptor code is included in the raptor code, but you raise a really good point - if the descriptor is invalid, indeed you can't know where the raptor code is, nor how long it is (some packet types have larger raptor codes). Moreover, since Raptor codes are not locally decodable, you couldn't just verify the descriptor.

What I think I'll do is I'll definite a minimum 32-bit raptor code that verifies 6x32-bit symbols. I'll move the most important packet info in those first 6 symbols, and position the raptor code to just after the descriptor and the stream ID (16+16 bits = one symbol). For shorter packets, I'll define a padding of zeroes that must be appended during raptor code computation. We don't waste bits from the raptor code - since decoders know there cannot be any errors in the padding, all correction will be done on however many symbols the data is. For any larger packets, there will be an additional raptor code at the very end to cover the rest of the data.

You mean the epoch field or the global_seq ? Given the next sentence I suppose it's the epoch field.

The epoch.

In general I don't understand the use of the time_descriptor given there are many cases where the value doesn't matter.

The idea is to enable synchronized presentation by giving a basis. If the basis is in the future, devices that get data sooner will wait. If it's in the past, devices will present as fast as possible and the epoch becomes more of a metadata. This assumes that all devices have a synchronized clock, but I think this is good enough to at least let several users around the world watch some content, whether local or a remote stream simultaneously. The reason why it applies in so few conditions is that one piece of feedback I got was that it's not relevant to real-time minimal-latency streams, so I tried to stress that on. I'll need to think of a better wording. Perhaps by giving an example like I just did.

Since it's on 32 bits, what happens once one you are at 0xFFFFFFFF ?

You roll over to 0x0.

Stream registration initializes the seq_number used for the stream to 0x0.
The sequence number MUST overflow gracefully back to 0x0.

I'll have to add one for the global counter too, or get rid of the stream-local counter altogether. I was waiting for Meuuuh from upipe to answer, but he seems away, so I think I'll get rid of all stream-local counters and replace them with a single counter. That was kieran's advice too.

What happens if the end of streampacket is lost over UDP ?

You detect there's been a dropped packet, and you request a new one. You may try to decode anyway until then, and if it doesn't work, you wait.

derived_stream_id doesn't much description on how to use it.

The way you use it is given in the stream flags.

And be a different ID ? Also the definition based on a mask is odd, because if you add stream_flags value that definition may not be true anymore.

Good point, I'll change the wording for this.

Does it have to start at 0 ? Does it overflow ?

It doesn't have to start at 0, but I'll mention it should. It does overflow and wrap around to 0.

By the dts is not in the stream ? It should reference the more generic "coding order" as opposed to the "display order".

It is in the stream, it's just at the start of H264 packets rather than in all streams. Wasting 64-bits in each data packet even though they're not needed for audio or non-MPEG streams was too excessive IMO. I'll change the wording somehow to add "coding order".

This is odd, you may use two b(8) fields .

Err, no? The first 8 bits of the descriptor are fixed. The latter 8 bits are used as flags.

zlib data compression might be added, it usually gives a good ratio with subtitle content.

I added zstd compression instead, to data, ICC and font packets, because it compresses better and faster than zlib. All demuxers will be required to implement decompression, so I think it's best to minimize the methods to keep the code down. Zstd was just the best out of what I looked, it offered better compression and speed than brotli. I asked around for whether any devices have zlib hardware decoding, and the answer was no (Intel Quickshare coprocessors apparently have hardware zlib decompression, but no one uses it), so I think zlib has no advantage, other than its a very widespread library. I did check how widespread libzstd is via Debian popcon, and the answer was close to 80% of installs have it, which I think makes it justified to abandon zlib.

From start of the Index packet or the beginning of the field ? Bytes excluded ? An example could clarify this.

Good point, I'll make this more explicit. I'm not happy with it either. I also want to work in subtitle preroll in the definition, because mpv demonstrated how useful that is.

Must be exact.

MUST be exact. ?

I don't even know what I meant to say by adding this. I suppose "must point to exactly the descriptor of a packet"? Not sure, but I'll go over index packets and specify them properly.

I'll ping you again once I've fixed everything, for now I'll merge your PR as-is.