Open ibeckermayer opened 4 months ago
I tried to reproduce and localize the issue with the RFX performance degradation, and here is my findings:
--no-server-pointer
) in release
build, with minimal logging (error
level) however, I on the other side, can't reproduce this behavior for some reason. I tried different versions of host Windows (Windows 10, Windows server 2022), simulated network latency up to 100ms, simulated bitrate limiting down to 100kb/s (via tc
and wondershaper
, however performance dropped both for RFX and Bitmap codecs, no RFX-special performance drop).FrameAcknowledge
is sent by IronRDP only after full active_stage processing stepmax_unacknowledged_frame_count
controls how much surface update PDUs could be sent by the server, until client is asked to send FrameAcknowledge
PDU back.
By setting it to 20
in our code (FreeRDP uses value of 2
), @probakowski has been able to temporarly fix the issue on their side, but the only difference after this change I see in PCAP is less outgoing (client->server) traffic and more surface update messages in a batch. The improvement in the performance suggests that either machine's network interface has very big latency for outgoing packets for fome reason (@ibeckermayer said that they are using AWS hosts in us-east-1
, measured ping to his local machine is up to 100ms) or IronRDP for some reason sends ACK packets very late. However later is more likely the problem, as supposedly with FreeRDP there are no such performance degradation problems even with such big network latency.
I tried to validate this theory and looked into packet delta times with enabled cursor software rendering (just to have something what I could reproduce):
We could see that on the image below ACK is sent by IronRDP after impressive 30ms interval, because we send response packets only after our active_stage
is finished its processing logic (not optimized cursor software rendering in this case produced such interval). However, what that actually means is with small max_unacknowledged_frame_count
numbers, while we are doing processing on our side (which is very slow), the server waits very long time each time a small update batch is received and processed, and overall image transfer time is increased exponentially with the growth of image resolution. In theory, if server will receive ACK in timely maner, the moment we are finished active stage processing, the new surface updates should have already be sent by the server and received/buffered by the client's socket. I am not 100% sure that this is the root cause of the issue, however we should give it a shot in my opinion. (@awakecoding your thoughts?)
I think the proper fix here will be sending ACK immediately after receiving RFX packets before actual processing, however it might require dirty workaround or big architecture change in IronRDP to get this done. I think Some kind of workaround would me more logical at current stage. We could try to fix this blindfolded without proper bug reproduction and ask @ibeckermayer to try new build
There is still an open question, why the issue only reproduces after window resize, even if window has been made even smaller? To continue this research, I want to replicate the exact same environment as @ibeckermayer, that is AWS VM running in us-east-1
, same machine kind, same OS. Hopefully, It will help to reproduce this properly and localize the issue.
I found a few places when PCAP capture becomes unreadable and/or server sends error:
This PDU should be never sent, and this looks like a garbage/leftover data sent over wire. It is clear that after those packets WireShark's RDP pdu parser fails to work for outgoing traffic (client->server), displaying it as encrypted TLS stream. We definitely need to fix that issue in any case, because further researches will be very painful without readable PCAP captures. FreeRDP's capture looks fine, so it it definitely an IronRDP bug. (working on it).
(suspicious) I found out that IronRDP uses the default (0
) share_id
for ShareControlHeader
PDUs, in contrast to FreeRDP
PCAPs, where we could clearly see some meaningful values here. I am not sure that this is directly related to RFX performance degradation (as I can't reproduce is properly), but at first glance, looks this value is completely ignored by the RDP server. However it might make sense to revisit this in future to make our packets more in-line with FreeRDP. (As far as I could see, we could get the correct share id
during RDP connection sequence and the use it later to send ShareDataControl
messages instead of hard coded share_id == 0
)
(minor) Malformed ClientInfo packet
It is not clear however, what causes this issue, as sent ClientInfo data is valid. However, even if client data is shown as malformed, this does not break PDU parsing in wireshark, so it looks more like WireShark bug.
(Research) If it is possible, (@awakecoding) could I get AWS machine in us-east-1 with Windows Server 2022, maybe with this host I could reproduce it? Or maybe @ibeckermayer could provide a public access to one of their isolated testing machines?
(Temporary fix/research) Make workaround for a slow FrameAcknowledge response and test / ask @ibeckermayer to test
(Bug) Improve software cursor rendering for RFX
(Bug) Fix mid-capture broken PCAP for IronRDP (Orders PDU)
(Bug) Send correct share_id
for each ShareControl pdu
FrameAcknowledge is sent by IronRDP only after full active_stage processing step
Interesting findings though it's not clear why this would cause a change in behavior before/after resize. Does the 30ms change before/after? Is FreeRDP's substantially faster?
Additional: PCAP issues
(suspicious) Malformed (unexpected?) Orders PDU - This is most troubling as it kills all further parsing of outgoing (client-> server) traffic parsing in WireShark
I think if you look closer you'll find that that's just an ordinary, well formed mouse move fastpath PDU. If you look at similar FreeRDP packets you'll see that they're slightly different in that they always use the optional length2
, however changing IronRDP to do the same doesn't fix this error.
(minor) Malformed ClientInfo packet
IIRC FreeRDP shows the same "malformed" packet in the same place
(working on it)
One thing that may be worth trying is using another TLS library. We currently support rustls and native-tls, but only rustls gives us SSLKEYLOGFILE
. However maybe there's another way to get that from native-tls, or another library altogether that isn't too difficult to integrate. This could help determine whether this is a rustls bug or something else.
Another thing I vaguely recall, though don't hold me to this, is that at one point IronRDP packet captures were working fine, and only later did they break. It might be worth going back to a version from within the last 2.5 years or so (say, maybe when rustls support was added) and just sanity checking that this was the case. You could then try to narrow down precisely which commit broke wireshark TLS parsing to help narrow down the culprit.
2 additional observations:
1) we calculate uncompressedLength
for ShareDataHeader
differently from FreeRDP, we include pdu type/compression fields size, while FreeRDP don't. Pretty sure this is bug on our side. However fixing this still not fixes wire shark rdp dissector sadly.
dst.write_u8(self.stream_priority.to_u8().unwrap());
dst.write_u16(cast_length!(
"uncompressedLength",
self.share_data_pdu.size()
- + PDU_TYPE_FIELD_SIZE
- + COMPRESSION_TYPE_FIELD_SIZE
- + COMPRESSED_LENGTH_FIELD_SIZE
)?);
2) We always send share_id
in ShareControlHeader
as 0
. However FreeRDP uses some meaningful value instead (most likely, received during connection activation process?)
ironrdp_connector::legacy::encode_share_data(
self.user_channel_id,
self.io_channel_id,
- 0,
+ share_id,
ShareDataPdu::FrameAcknowledge(FrameAcknowledgePdu {
frame_id: marker.frame_id.unwrap_or(0),
}),
output,
)
Interesting findings though it's not clear why this would cause a change in behavior before/after resize. Does the 30ms change before/after? Is FreeRDP's substantially faster?
Oof, I was re-validating my theory today and looks like it just not it... It is around 10-30 ms with software cursor processing, yes, but without it, it is actually pretty small and very comparable with FreeRDP, by taking up to 1-3 ms in general, only with spikes around 10ms (FreeRDP have some spikes too). :cry:
I think if you look closer you'll find that that's just an ordinary, well formed mouse move fastpath PDU
You are right, this should be the valid PDU indeed!
Looks like the updated course of action is (cc @awakecoding):
SSLKEYLOGFILE
- might help with broken Wireshark dissector problem if it is TLS-related?ShareDataHeader
s uncompressedLength
share_id
impact on performanceAlso, tagging @CBenoit in case he have some ideas about this bug too
Background
After adding protocol level support for dynamic resizes via the
DisplayControl
DVC inthis was all actually hooked up in the
ironrdp-client
in https://github.com/Devolutions/IronRDP/pull/430. The initial attempt succeeded in allowing for dynamic resize, however after resizes performance of the client noticeably degraded after a resize or two. This issue does not reproduce with FreeRDP, meaning that it's a bug in IronRDP.Pointer settings and FreeRDP alignment
We tried many things to resolve the issue including:
(IronRDP rev
f7b4f546650231ce345e9ee67f6ad29b2b93f937
is aligned with FreeRDPba8cf8cf2158018fb7abbedb51ab245f369be813
)all to no avail.
We know the problem is related to the RemoteFX codepath -- when we switched to using bitmaps, it went away.
Profiling
Comparing flamegraphs between an IronRDP session with a resize vs without did not reveal anything of relevance to this issue: Download.
The absolute time of frame processing (basically how long active_stage.process(&mut image, action, &payload)? takes to run) was measured at <1ms, before and after resize. However, after a resize, the average time between frames (how often frame = framed.read_pdu() => { gets called) goes up 4-5x, 30ms vs 140-150ms).
Temp Solution
What finally gave us a temporary solution was upping the
maxUnacknowledgedFrameCount
from2
to>= 10
(credit to @probakowski for discovering this). We know that this is not any sort of root cause, because FreeRDP uses2
for this fieldImplications, what to examine next
The
maxUnacknowledgedFrameCount
plays a role in when the server sends us frame updates: after sending usmaxUnacknowledgedFrameCount
frames, it waits for an ACK response before sending the next frame. The fact that bumping this value more or less solves the performance issue suggests that the holdup is related to that mechanism. Logically this seems to imply that the problem is either<1ms
timeframe of active_stage.process(&mut image, action, &payload)? before and after resize, and that timescale is noticeable to the naked eye wrt performance.