Devolutions / IronRDP

Rust implementation of the Microsoft Remote Desktop Protocol (RDP)
Apache License 2.0
364 stars 48 forks source link

Devilish resize bug #447

Open ibeckermayer opened 4 months ago

ibeckermayer commented 4 months ago

Background

After adding protocol level support for dynamic resizes via the DisplayControl DVC in

this was all actually hooked up in the ironrdp-client in https://github.com/Devolutions/IronRDP/pull/430. The initial attempt succeeded in allowing for dynamic resize, however after resizes performance of the client noticeably degraded after a resize or two. This issue does not reproduce with FreeRDP, meaning that it's a bug in IronRDP.

Pointer settings and FreeRDP alignment

We tried many things to resolve the issue including:

(IronRDP rev f7b4f546650231ce345e9ee67f6ad29b2b93f937 is aligned with FreeRDP ba8cf8cf2158018fb7abbedb51ab245f369be813)

all to no avail.

We know the problem is related to the RemoteFX codepath -- when we switched to using bitmaps, it went away.

Profiling

Comparing flamegraphs between an IronRDP session with a resize vs without did not reveal anything of relevance to this issue: Download.

The absolute time of frame processing (basically how long active_stage.process(&mut image, action, &payload)? takes to run) was measured at <1ms, before and after resize. However, after a resize, the average time between frames (how often frame = framed.read_pdu() => { gets called) goes up 4-5x, 30ms vs 140-150ms).

Temp Solution

What finally gave us a temporary solution was upping the maxUnacknowledgedFrameCount from 2 to >= 10 (credit to @probakowski for discovering this). We know that this is not any sort of root cause, because FreeRDP uses 2 for this field

![image](https://github.com/Devolutions/IronRDP/assets/13578537/2393975e-ed72-438c-bc04-9c49954a44eb)

Implications, what to examine next

The maxUnacknowledgedFrameCount plays a role in when the server sends us frame updates: after sending us maxUnacknowledgedFrameCount frames, it waits for an ACK response before sending the next frame. The fact that bumping this value more or less solves the performance issue suggests that the holdup is related to that mechanism. Logically this seems to imply that the problem is either

  1. IronRDP is taking too long to reply to frames with an ACK (after resize)
  2. The RDP server is taking too long to process the ACKs after resize.
pacmancoder commented 3 months ago

I tried to reproduce and localize the issue with the RFX performance degradation, and here is my findings:

General observations

FrameAcknowledge is sent by IronRDP only after full active_stage processing step

max_unacknowledged_frame_count controls how much surface update PDUs could be sent by the server, until client is asked to send FrameAcknowledge PDU back.

By setting it to 20 in our code (FreeRDP uses value of 2), @probakowski has been able to temporarly fix the issue on their side, but the only difference after this change I see in PCAP is less outgoing (client->server) traffic and more surface update messages in a batch. The improvement in the performance suggests that either machine's network interface has very big latency for outgoing packets for fome reason (@ibeckermayer said that they are using AWS hosts in us-east-1, measured ping to his local machine is up to 100ms) or IronRDP for some reason sends ACK packets very late. However later is more likely the problem, as supposedly with FreeRDP there are no such performance degradation problems even with such big network latency.

I tried to validate this theory and looked into packet delta times with enabled cursor software rendering (just to have something what I could reproduce):

We could see that on the image below ACK is sent by IronRDP after impressive 30ms interval, because we send response packets only after our active_stage is finished its processing logic (not optimized cursor software rendering in this case produced such interval). However, what that actually means is with small max_unacknowledged_frame_count numbers, while we are doing processing on our side (which is very slow), the server waits very long time each time a small update batch is received and processed, and overall image transfer time is increased exponentially with the growth of image resolution. In theory, if server will receive ACK in timely maner, the moment we are finished active stage processing, the new surface updates should have already be sent by the server and received/buffered by the client's socket. I am not 100% sure that this is the root cause of the issue, however we should give it a shot in my opinion. (@awakecoding your thoughts?)

image-1

I think the proper fix here will be sending ACK immediately after receiving RFX packets before actual processing, however it might require dirty workaround or big architecture change in IronRDP to get this done. I think Some kind of workaround would me more logical at current stage. We could try to fix this blindfolded without proper bug reproduction and ask @ibeckermayer to try new build

Open question

There is still an open question, why the issue only reproduces after window resize, even if window has been made even smaller? To continue this research, I want to replicate the exact same environment as @ibeckermayer, that is AWS VM running in us-east-1, same machine kind, same OS. Hopefully, It will help to reproduce this properly and localize the issue.

Additional: PCAP issues

Next plan of actions

ibeckermayer commented 3 months ago

FrameAcknowledge is sent by IronRDP only after full active_stage processing step

Interesting findings though it's not clear why this would cause a change in behavior before/after resize. Does the 30ms change before/after? Is FreeRDP's substantially faster?

Additional: PCAP issues

(suspicious) Malformed (unexpected?) Orders PDU - This is most troubling as it kills all further parsing of outgoing (client-> server) traffic parsing in WireShark

I think if you look closer you'll find that that's just an ordinary, well formed mouse move fastpath PDU. If you look at similar FreeRDP packets you'll see that they're slightly different in that they always use the optional length2, however changing IronRDP to do the same doesn't fix this error.

(minor) Malformed ClientInfo packet

IIRC FreeRDP shows the same "malformed" packet in the same place

(working on it)

One thing that may be worth trying is using another TLS library. We currently support rustls and native-tls, but only rustls gives us SSLKEYLOGFILE. However maybe there's another way to get that from native-tls, or another library altogether that isn't too difficult to integrate. This could help determine whether this is a rustls bug or something else.

Another thing I vaguely recall, though don't hold me to this, is that at one point IronRDP packet captures were working fine, and only later did they break. It might be worth going back to a version from within the last 2.5 years or so (say, maybe when rustls support was added) and just sanity checking that this was the case. You could then try to narrow down precisely which commit broke wireshark TLS parsing to help narrow down the culprit.

pacmancoder commented 3 months ago

2 additional observations:

1) we calculate uncompressedLength for ShareDataHeader differently from FreeRDP, we include pdu type/compression fields size, while FreeRDP don't. Pretty sure this is bug on our side. However fixing this still not fixes wire shark rdp dissector sadly.

            dst.write_u8(self.stream_priority.to_u8().unwrap());
            dst.write_u16(cast_length!(
                "uncompressedLength",
                self.share_data_pdu.size()
  -                    + PDU_TYPE_FIELD_SIZE
  -                    + COMPRESSION_TYPE_FIELD_SIZE
  -                    + COMPRESSED_LENGTH_FIELD_SIZE
            )?);

2) We always send share_id in ShareControlHeader as 0. However FreeRDP uses some meaningful value instead (most likely, received during connection activation process?)

                ironrdp_connector::legacy::encode_share_data(
                    self.user_channel_id,
                    self.io_channel_id,
  -                    0,
  +                   share_id,
                    ShareDataPdu::FrameAcknowledge(FrameAcknowledgePdu {
                       frame_id: marker.frame_id.unwrap_or(0),
                    }),
                    output,
                )
pacmancoder commented 3 months ago

Interesting findings though it's not clear why this would cause a change in behavior before/after resize. Does the 30ms change before/after? Is FreeRDP's substantially faster?

Oof, I was re-validating my theory today and looks like it just not it... It is around 10-30 ms with software cursor processing, yes, but without it, it is actually pretty small and very comparable with FreeRDP, by taking up to 1-3 ms in general, only with spikes around 10ms (FreeRDP have some spikes too). :cry:

I think if you look closer you'll find that that's just an ordinary, well formed mouse move fastpath PDU

You are right, this should be the valid PDU indeed!

Looks like the updated course of action is (cc @awakecoding):

Also, tagging @CBenoit in case he have some ideas about this bug too