Devilish resize bug - Githubissues

Background

After adding protocol level support for dynamic resizes via the DisplayControl DVC in

this was all actually hooked up in the ironrdp-client in https://github.com/Devolutions/IronRDP/pull/430. The initial attempt succeeded in allowing for dynamic resize, however after resizes performance of the client noticeably degraded after a resize or two. This issue does not reproduce with FreeRDP, meaning that it's a bug in IronRDP.

Pointer settings and FreeRDP alignment

We tried many things to resolve the issue including:

all variations of pointer settings in IronRDP
Meticulously aligning IronRDP's Capability Sets with FreeRDP's
Meticulously aligning IronRDP's Client MCS Connect Initial PDU with GCC Conference Create Request with FreeRDP's

(IronRDP rev f7b4f546650231ce345e9ee67f6ad29b2b93f937 is aligned with FreeRDP ba8cf8cf2158018fb7abbedb51ab245f369be813)

all to no avail.

We know the problem is related to the RemoteFX codepath -- when we switched to using bitmaps, it went away.

Profiling

Comparing flamegraphs between an IronRDP session with a resize vs without did not reveal anything of relevance to this issue: Download.

The absolute time of frame processing (basically how long active_stage.process(&mut image, action, &payload)? takes to run) was measured at <1ms, before and after resize. However, after a resize, the average time between frames (how often frame = framed.read_pdu() => { gets called) goes up 4-5x, 30ms vs 140-150ms).

Temp Solution

What finally gave us a temporary solution was upping the maxUnacknowledgedFrameCount from 2 to >= 10 (credit to @probakowski for discovering this). We know that this is not any sort of root cause, because FreeRDP uses 2 for this field

![image](https://github.com/Devolutions/IronRDP/assets/13578537/2393975e-ed72-438c-bc04-9c49954a44eb)

Implications, what to examine next

The maxUnacknowledgedFrameCount plays a role in when the server sends us frame updates: after sending us maxUnacknowledgedFrameCount frames, it waits for an ACK response before sending the next frame. The fact that bumping this value more or less solves the performance issue suggests that the holdup is related to that mechanism. Logically this seems to imply that the problem is either

IronRDP is taking too long to reply to frames with an ACK (after resize)
- Although processing and encoding the response (via active_stage.process(&mut image, action, &payload)?) doesn't appear to change much per the note in the Profiling section above, there may yet be issues due to either
  1. The actual act of writing the response frame, whose performance has not been measured.
  2. Perhaps there's a substantial difference within the <1ms timeframe of active_stage.process(&mut image, action, &payload)? before and after resize, and that timescale is noticeable to the naked eye wrt performance.
The RDP server is taking too long to process the ACKs after resize.

I tried to reproduce and localize the issue with the RFX performance degradation, and here is my findings:

General observations

RFX performance is greatly reduced when using software rendering for cursors, as currently cursor is rendered to the back buffer for every tile. This could be easily improved, I'll prepare follow-up PR for that.
@ibeckermayer have encountered RFX performance degradation even without software cursor rendering enabled (--no-server-pointer) in release build, with minimal logging (error level) however, I on the other side, can't reproduce this behavior for some reason. I tried different versions of host Windows (Windows 10, Windows server 2022), simulated network latency up to 100ms, simulated bitrate limiting down to 100kb/s (via tc and wondershaper, however performance dropped both for RFX and Bitmap codecs, no RFX-special performance drop).
However, I think I found one suspicious discrepancy between FreeRDP & IronRDP captures: the interval between surface update received and ACK sent by the client is a much bigger for IronRDP than for FreeRDP:

`FrameAcknowledge` is sent by IronRDP only after full active_stage processing step

max_unacknowledged_frame_count controls how much surface update PDUs could be sent by the server, until client is asked to send FrameAcknowledge PDU back.

By setting it to 20 in our code (FreeRDP uses value of 2), @probakowski has been able to temporarly fix the issue on their side, but the only difference after this change I see in PCAP is less outgoing (client->server) traffic and more surface update messages in a batch. The improvement in the performance suggests that either machine's network interface has very big latency for outgoing packets for fome reason (@ibeckermayer said that they are using AWS hosts in us-east-1, measured ping to his local machine is up to 100ms) or IronRDP for some reason sends ACK packets very late. However later is more likely the problem, as supposedly with FreeRDP there are no such performance degradation problems even with such big network latency.

I tried to validate this theory and looked into packet delta times with enabled cursor software rendering (just to have something what I could reproduce):

We could see that on the image below ACK is sent by IronRDP after impressive 30ms interval, because we send response packets only after our active_stage is finished its processing logic (not optimized cursor software rendering in this case produced such interval). However, what that actually means is with small max_unacknowledged_frame_count numbers, while we are doing processing on our side (which is very slow), the server waits very long time each time a small update batch is received and processed, and overall image transfer time is increased exponentially with the growth of image resolution. In theory, if server will receive ACK in timely maner, the moment we are finished active stage processing, the new surface updates should have already be sent by the server and received/buffered by the client's socket. I am not 100% sure that this is the root cause of the issue, however we should give it a shot in my opinion. (@awakecoding your thoughts?)

I think the proper fix here will be sending ACK immediately after receiving RFX packets before actual processing, however it might require dirty workaround or big architecture change in IronRDP to get this done. I think Some kind of workaround would me more logical at current stage. We could try to fix this blindfolded without proper bug reproduction and ask @ibeckermayer to try new build

Open question

There is still an open question, why the issue only reproduces after window resize, even if window has been made even smaller? To continue this research, I want to replicate the exact same environment as @ibeckermayer, that is AWS VM running in us-east-1, same machine kind, same OS. Hopefully, It will help to reproduce this properly and localize the issue.

Additional: PCAP issues

I found a few places when PCAP capture becomes unreadable and/or server sends error:
- (suspicious) Malformed (unexpected?) Orders PDU - This is most troubling as it kills all further parsing of outgoing (client->server) traffic parsing in WireShark
This PDU should be never sent, and this looks like a garbage/leftover data sent over wire. It is clear that after those packets WireShark's RDP pdu parser fails to work for outgoing traffic (client->server), displaying it as encrypted TLS stream. We definitely need to fix that issue in any case, because further researches will be very painful without readable PCAP captures. FreeRDP's capture looks fine, so it it definitely an IronRDP bug. (working on it).
- (suspicious) I found out that IronRDP uses the default (0) share_id for ShareControlHeader PDUs, in contrast to FreeRDP PCAPs, where we could clearly see some meaningful values here. I am not sure that this is directly related to RFX performance degradation (as I can't reproduce is properly), but at first glance, looks this value is completely ignored by the RDP server. However it might make sense to revisit this in future to make our packets more in-line with FreeRDP. (As far as I could see, we could get the correct share id during RDP connection sequence and the use it later to send ShareDataControl messages instead of hard coded share_id == 0)
- (minor) Malformed ClientInfo packet
It is not clear however, what causes this issue, as sent ClientInfo data is valid. However, even if client data is shown as malformed, this does not break PDU parsing in wireshark, so it looks more like WireShark bug.

Next plan of actions

(Research) If it is possible, (@awakecoding) could I get AWS machine in us-east-1 with Windows Server 2022, maybe with this host I could reproduce it? Or maybe @ibeckermayer could provide a public access to one of their isolated testing machines?
(Temporary fix/research) Make workaround for a slow FrameAcknowledge response and test / ask @ibeckermayer to test
(Bug) Improve software cursor rendering for RFX
(Bug) Fix mid-capture broken PCAP for IronRDP (Orders PDU)
(Bug) Send correct share_id for each ShareControl pdu

FrameAcknowledge is sent by IronRDP only after full active_stage processing step

Interesting findings though it's not clear why this would cause a change in behavior before/after resize. Does the 30ms change before/after? Is FreeRDP's substantially faster?

Additional: PCAP issues

(suspicious) Malformed (unexpected?) Orders PDU - This is most troubling as it kills all further parsing of outgoing (client-> server) traffic parsing in WireShark

I think if you look closer you'll find that that's just an ordinary, well formed mouse move fastpath PDU. If you look at similar FreeRDP packets you'll see that they're slightly different in that they always use the optional length2, however changing IronRDP to do the same doesn't fix this error.

(minor) Malformed ClientInfo packet

IIRC FreeRDP shows the same "malformed" packet in the same place

(working on it)

One thing that may be worth trying is using another TLS library. We currently support rustls and native-tls, but only rustls gives us SSLKEYLOGFILE. However maybe there's another way to get that from native-tls, or another library altogether that isn't too difficult to integrate. This could help determine whether this is a rustls bug or something else.

Another thing I vaguely recall, though don't hold me to this, is that at one point IronRDP packet captures were working fine, and only later did they break. It might be worth going back to a version from within the last 2.5 years or so (say, maybe when rustls support was added) and just sanity checking that this was the case. You could then try to narrow down precisely which commit broke wireshark TLS parsing to help narrow down the culprit.

2 additional observations:

1) we calculate uncompressedLength for ShareDataHeader differently from FreeRDP, we include pdu type/compression fields size, while FreeRDP don't. Pretty sure this is bug on our side. However fixing this still not fixes wire shark rdp dissector sadly.

            dst.write_u8(self.stream_priority.to_u8().unwrap());
            dst.write_u16(cast_length!(
                "uncompressedLength",
                self.share_data_pdu.size()
  -                    + PDU_TYPE_FIELD_SIZE
  -                    + COMPRESSION_TYPE_FIELD_SIZE
  -                    + COMPRESSED_LENGTH_FIELD_SIZE
            )?);

2) We always send share_id in ShareControlHeader as 0. However FreeRDP uses some meaningful value instead (most likely, received during connection activation process?)

                ironrdp_connector::legacy::encode_share_data(
                    self.user_channel_id,
                    self.io_channel_id,
  -                    0,
  +                   share_id,
                    ShareDataPdu::FrameAcknowledge(FrameAcknowledgePdu {
                       frame_id: marker.frame_id.unwrap_or(0),
                    }),
                    output,
                )

Interesting findings though it's not clear why this would cause a change in behavior before/after resize. Does the 30ms change before/after? Is FreeRDP's substantially faster?

Oof, I was re-validating my theory today and looks like it just not it... It is around 10-30 ms with software cursor processing, yes, but without it, it is actually pretty small and very comparable with FreeRDP, by taking up to 1-3 ms in general, only with spikes around 10ms (FreeRDP have some spikes too). :cry:

I think if you look closer you'll find that that's just an ordinary, well formed mouse move fastpath PDU

You are right, this should be the valid PDU indeed!

Looks like the updated course of action is (cc @awakecoding):

Continue attempts to reproduce the bug (@ibeckermayer kindly agreed to provide his AWS machine access to me today, let's hope it will help reproduce this indeed devilish bug)
I had an idea today that we could get unencrypted packets capture via RDP proxying with Gateway PCAP interceptor instead of SSLKEYLOGFILE - might help with broken Wireshark dissector problem if it is TLS-related?
If gateway capture will not help -- switch to other TLS lib temporary, to achieve clean non-corrupted PCAPs
Fix found bugs:
- Invalid ShareDataHeaders uncompressedLength
- Optimize pointer software rendering for RFX (seems not related to RFX, but it is easy enough to fix)
- Investigate invalid share_id impact on performance

Also, tagging @CBenoit in case he have some ideas about this bug too

Devolutions / IronRDP

Devilish resize bug #447