KallDrexx / rust-media-libs

Rust based libraries for misc media functionality
Apache License 2.0
229 stars 58 forks source link

Serialization of larger payloads can lead to incorrect chunks #21

Closed pvheiningen closed 3 years ago

pvheiningen commented 3 years ago

With increased bitrates, the rtmp stream created by the mio-rtmp-server is no longer playable in ffmpeg.

Steps to Reproduce

  1. Build latest version of mio-rtmp-server v0.1.1 and run it
  2. Create a test stream on the same device that is running the rtmp server.

We tested with the following commands:

  1. Connect to the stream with ffmpeg: ffplay -loglevel trace -noinfbuf rtmp://IPADDRESS/live/test
  2. The following errors are observed:
    [rtmp @ 0x7f2c78001780] No default whitelist set
    [tcp @ 0x7f2c78002040] No default whitelist set
    [tcp @ 0x7f2c78002040] Original list of addresses:
    [tcp @ 0x7f2c78002040] Address 192.168.2.138 port 1935
    [tcp @ 0x7f2c78002040] Interleaved list of addresses:
    [tcp @ 0x7f2c78002040] Address 192.168.2.138 port 1935
    [tcp @ 0x7f2c78002040] Starting connection attempt to 192.168.2.138 port 1935
    [tcp @ 0x7f2c78002040] Successfully connected to 192.168.2.138 port 1935
    [rtmp @ 0x7f2c78001780] Handshaking...
    [rtmp @ 0x7f2c78001780] Type answer 3B vq=    0KB sq=    0B f=0/0   
    [rtmp @ 0x7f2c78001780] Server version 128.0.7.2
    [rtmp @ 0x7f2c78001780] Proto = rtmp, path = /live/test, app = live, fname = test
    [rtmp @ 0x7f2c78001780] New incoming chunk size = 4096
    [rtmp @ 0x7f2c78001780] Window acknowledgement size = 1073741824
    [rtmp @ 0x7f2c78001780] Max sent, unacked = 2500000
    [rtmp @ 0x7f2c78001780] Creating stream...
    [rtmp @ 0x7f2c78001780] Sending play command for 'test' f=0/0   
    [rtmp @ 0x7f2c78001780] Unknown packet type received 0x000B f=0/0   
    Last message repeated 9 times
    [tcp @ 0x7f2c78002040] RTMP packet size mismatch 9851467 != 14907160

Detailed Description

When lowering the bitrate of the original stream (to 3 Mb/s) the resulting output stream can correctly be played.

We dumped the logs with the --log-io option in the mio-rtmp-server, and parsed them with the rtmp-log-reader v0.1.0. The input logs could be parsed correctly, but parsing the output logs caused a ChunkDeserializationError: thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ChunkDeserializationError { kind: NoPreviousChunkOnStream { csid: 7 } }', tools/rtmp-log-reader/src/main.rs:50:70 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

1624005090-1.rtmp.input.log

1624005096-2.rtmp.output.log

It seems like there is a header packet that contains an invalid chunk stream id which caused both the parser and ffmpeg to fail.

Thanks in advance for any advice/help!

pvheiningen commented 3 years ago

I traced it down to https://github.com/KallDrexx/rust-media-libs/blob/72957971b64795188d541df55a1287efd4b5ecf7/examples/mio_rtmp_server/src/connection.rs#L198. The size of the bytes object and the actual bytes_sent don't match.

KallDrexx commented 3 years ago

Thanks for the extremely thorough report.

It definitely sounds like a mismatch somewhere where it's saying a chunk has a max size of X but it's writing chunks that are larger than that.

I'm a bit busy but I can try to repro and take a look this weekend

pvheiningen commented 3 years ago

FWIW: I applied the following patch to write out the full bytes object to the logs. With that change, the parser can read the logs without errors. Now I'm getting the Full write buffer! error. connection.patch.txt

KallDrexx commented 3 years ago

yeah I think the connection.rs issue is a symptom not a cause.

The steps I would take would be to go through the rtmp log parser and look for an outbound "Set chunk size" message and see what value the mio server is telling ffplay the max chunk size is. Then pin down the RTMP message this library is serializing and see what the chunks it produces are. I'm going to guess the chunks being produced are larger than what we told ffplay we would be maxing out as.

KallDrexx commented 3 years ago

So I wasn't able to reproduce the issue locally (on the latest commit at least) I used the ffmpeg testsrc2 that you used (though used rtmp://localhost/live/test where as your example had rtmp://localhost/test) and was able to play it fine in ffplay. I also used ffmpeg to stream 4k VBR big buck bunny (which gets up to 22Mbps at points) and a transcoded BBB at 8Mbps.

I went through your output stream and that's interesting to look at (and the problem isn't what I was expecting). So what happened is it tried to send the first video key frame, which is 111,786 bytes. Since it starts with a SetChunkSize of 4092 it needs to break it up into 28 RTMP chunks. It does this correctly until the 27th or 28th chunk.

The 27th chunk starts at byte offset 107,391, it claims it's a type 3 header (so it's a single c4 byte on csid 4). It should only provide 4096 bytes. The very next byte (offset 111,488) has a value of 47, which decodes to a type 1 header on csid 7.

However, looking at the bytes preceding this, the same 32 bytes also appear in the input right before a 47 byte value. This signals to me that the 47 isn't an RTMP chunk header but is actually part of the video, which means that for some reason the last RTMP chunk that you had serialized created a RTMP chunk that had more than 4096 bytes in it's payload, which would of course cause ffplay to be unable to read it (for the same reason the rtmp reader tool fails to read it).

That shouldn't happen because of https://github.com/KallDrexx/rust-media-libs/blob/master/rtmp/src/chunk_io/serializer.rs#L107 so I'm unsure of where to go from here.

pvheiningen commented 3 years ago

Thank you for looking into this!

From what I found, the serializer seems to be working fine and is serializing the final chunk with the correct size. I went through most of the serializing/deserializing logic and it seemed to be working fine.

From what I found in the logs, the next byte that is read (the type 1 header) was actually the start of the next RTMP message (so we're missing the final part of the keyframe)

When I changed this line (as an experiment) to write the full 'bytes' to the logs, instead of only the 'bytes_sent', the log parser could parse the results, where-as it reliably failed to do so before.

pvheiningen commented 3 years ago

All right, I believe I have found the problem. The default buffer sizes on the TcpStream are too small for the larger buffers that I'm trying to send.

By default, the socket has these settings. nodelay: false, recv buffer: 369280, send buffer: 87040 When I increased the buffer sizes it started working reliably!

I'll create a PR with my changes so you can have a look.

KallDrexx commented 3 years ago

Fascinating, I guess I haven't done a lot of network programming before, I figured the TCP buffers would just expand to whatever I wrote to them. I also guess the reason I couldn't reproduce this was because I was testing locally, which (at least on windows) probably takes advantage of loopback short circuiting. I should have paid more attention to your connection.rs investigation, though I'm not sure I would have figured that out.

Thanks a ton for the PR!

KallDrexx commented 3 years ago

I went ahead and merged it cause it looks good to me (also I'm not sure I ever expected the mio example to be production ready :D ).

Thinking more about this, I really should have paid more attention to your observation about the bytes vs bytes_sent mismatch. I must have gotten the wrong meaning when I originally wrote it because now it makes perfect sense on what's happening.

I suspect that while the PR you wrote works for now, the more sustainable solution is to put it back to the way it was (original buffer size and original write() instead of write_all(). You then loop connection.rs:203 and if bytes_sent != bytes.len() you then repeat only sending bytes[start..] until the buffer is emptied. If a WouldBlock error occurs you'd requeue only the unsent bytes for when the socket becomes writable again.

I suspect the advantage of this is that you end up with more manageable sized TCP packets, which means you get better balance of less number of packets and how much data has to be retransmitted due to networking errors/packet loss.

I can experiment with that sometime in the next few days.

KallDrexx commented 3 years ago

Haven't been able to get back around to this until now. Weirdly enough I can't get it to happen, even when using separate machines. Since the issue is fixed for you, and the mio server is just a prototype I'm going to leave it as it is, so I don't accidentally break it on you :)

Thanks again!