PowerShell / Win32-OpenSSH

Win32 port of OpenSSH
7.37k stars 758 forks source link

[BUG] Unnecessary delay in ssh-keyscan #1864

Open jvpernis opened 2 years ago

jvpernis commented 2 years ago

There is a bug in this fork's version of ssh-keyscan where there is always an unnecessary delay of at least timeout seconds before it actually starts processing and printing the results. This behavior is not present in the unix versions of the tool, nor is it present in the MinGW's Windows build of this tool.

The bug is present in both OpenSSH_for_Windows_8.1p1 as well as the latest pre-release OpenSSH_for_Windows_8.6p1 installed through Chocolatey.

Steps to reproduce:

The default timeout is 5 seconds, but you can provide a custom timeout with the -T argument:

With the default timeout:

PS > Measure-Command { ssh-keyscan github.com | Out-Default }
# github.com:22 SSH-2.0-babeld-e37038b1
# github.com:22 SSH-2.0-babeld-e37038b1
# github.com:22 SSH-2.0-babeld-e37038b1
github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 5
Milliseconds      : 741
Ticks             : 57419208
TotalDays         : 6.64574166666667E-05
TotalHours        : 0.001594978
TotalMinutes      : 0.09569868
TotalSeconds      : 5.7419208
TotalMilliseconds : 5741.9208

With a 1 second timeout:

PS > Measure-Command { ssh-keyscan -T 1 github.com | Out-Default } 
# github.com:22 SSH-2.0-babeld-e37038b1
# github.com:22 SSH-2.0-babeld-e37038b1
# github.com:22 SSH-2.0-babeld-e37038b1
github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 722
Ticks             : 17229522
TotalDays         : 1.99415763888889E-05
TotalHours        : 0.000478597833333333
TotalMinutes      : 0.02871587
TotalSeconds      : 1.7229522
TotalMilliseconds : 1722.9522

With a 10 second timeout:

PS > Measure-Command { ssh-keyscan -T 10 github.com | Out-Default }      
# github.com:22 SSH-2.0-babeld-e37038b1
# github.com:22 SSH-2.0-babeld-e37038b1
# github.com:22 SSH-2.0-babeld-e37038b1
github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 10
Milliseconds      : 727
Ticks             : 107271945
TotalDays         : 0.00012415734375
TotalHours        : 0.00297977625
TotalMinutes      : 0.178786575
TotalSeconds      : 10.7271945
TotalMilliseconds : 10727.1945

As you can see, it takes roughly 700ms to execute the key scanning activities, but there is an additional delay directly influenced by the timeout value.

Cause:

After doing some investigations, I found the exact location in the ssh-keyscan source code where the delay takes place:

Source [openssh-portable/ssh-keyscan.c](https://github.com/PowerShell/openssh-portable/blob/75835a2462e1d8caf614cdb7011e45da929dc142/ssh-keyscan.c#L596-L598) ```c while (select(maxfd, r, NULL, e, &seltime) == -1 && (errno == EAGAIN || errno == EINTR || errno == EWOULDBLOCK)) ; ```

The select() is macro'd to w32_select():

Source [openssh-portable/contrib/win32/win32compat/inc/sys/select.h](https://github.com/PowerShell/openssh-portable/blob/75835a2462e1d8caf614cdb7011e45da929dc142/contrib/win32/win32compat/inc/sys/select.h#L29-L31) ```c int w32_select(int fds, w32_fd_set * , w32_fd_set * , w32_fd_set * , const struct timeval *); #define select(a,b,c,d,e) w32_select((a), (b), (c), (d), (e)) ```

The culprit is found in this snippet from w32_select(), where an async I/O is started on the selected fds and the relevant events are tracked:

Source [openssh-portable/contrib/win32/win32compat/w32fd.c](https://github.com/PowerShell/openssh-portable/blob/75835a2462e1d8caf614cdb7011e45da929dc142/contrib/win32/win32compat/w32fd.c#L772-L802) ```c /* * start async io on selected fds if needed and pick up any events * that select needs to listen on */ for (int i = 0; i < fds; i++) { if (readfds && FD_ISSET(i, readfds)) { w32_io_on_select(fd_table.w32_ios[i], TRUE); if ((fd_table.w32_ios[i]->type == SOCK_FD) && (fd_table.w32_ios[i]->internal.state == SOCK_LISTENING)) { if (num_events == SELECT_EVENT_LIMIT) { debug3("select - ERROR: max #events breach"); errno = ENOMEM; return -1; } events[num_events++] = fd_table.w32_ios[i]->read_overlapped.hEvent; } } if (writefds && FD_ISSET(i, writefds)) { w32_io_on_select(fd_table.w32_ios[i], FALSE); if ((fd_table.w32_ios[i]->type == SOCK_FD) && (fd_table.w32_ios[i]->internal.state == SOCK_CONNECTING)) { if (num_events == SELECT_EVENT_LIMIT) { debug3("select - ERROR: max #events reached for select"); errno = ENOMEM; return -1; } events[num_events++] = fd_table.w32_ios[i]->write_overlapped.hEvent; } } } ```

Then later, a blocking wait is called which wakes either on any event, or when the timeout is reached:

Source [openssh-portable/contrib/win32/win32compat/w32fd.c](https://github.com/PowerShell/openssh-portable/blob/75835a2462e1d8caf614cdb7011e45da929dc142/contrib/win32/win32compat/w32fd.c#L843-L844) ```c if (0 != wait_for_any_event(events, num_events, time_rem)) return -1; ```

However, for this particular call to w32_select(), there were no events tracked and num_events remains 0. Therefore, all wait_for_any_event() can do is wait for the timeout.

Workaround

As I showed earlier with the steps to reproduce, we can provide a lower timeout value to lower the amount of useless waiting. One second is the lowest value we can give to ssh-keyscan. So even though we shaved off 4 seconds of useless waiting, there is still a full second of waiting which should just not be there.

Even if ssh-keyscan would accept sub-second timeout values, this would not be desirable since it limits the amount of jitter that is tolerated within a network before the actual timeout is triggered at other places in the code.

Bugfix patch

I have prepared a bugfix patch which just checks the I/O fds periodically when there are no events to track. This worked really well to get rid of the useless waiting, and doesn't influence other users of this method.

diff --git a/contrib/win32/win32compat/w32fd.c b/contrib/win32/win32compat/w32fd.c
index 01f59016..8457a4fa 100644
--- a/contrib/win32/win32compat/w32fd.c
+++ b/contrib/win32/win32compat/w32fd.c
@@ -835,7 +835,12 @@ w32_select(int fds, w32_fd_set* readfds, w32_fd_set* writefds, w32_fd_set* excep
                                        debug4("select - timing out");
                                        break;
                                }
-                               time_rem = timeout_ms - (ticks_spent & 0xffffffff);
+
+                               /* just periodically check the fds when there are no events to listen for */
+                               if (num_events == 0)
+                                       time_rem = 10;
+                               else
+                                       time_rem = timeout_ms - (ticks_spent & 0xffffffff);
                        }
                        else
                                time_rem = INFINITE;

If this fix is considered acceptable I can raise a PR with this patch. However, I am not convinced that this would be a very structural solution to this problem. Ideally there would be I/O events to wait for by wait_for_any_event(), but I do not know how to achieve this.

AndrewSav commented 2 years ago

We are also affected by this issue