There is a bug in this fork's version of ssh-keyscan where there is always an unnecessary delay of at least timeout seconds before it actually starts processing and printing the results. This behavior is not present in the unix versions of the tool, nor is it present in the MinGW's Windows build of this tool.
The bug is present in both OpenSSH_for_Windows_8.1p1 as well as the latest pre-release OpenSSH_for_Windows_8.6p1 installed through Chocolatey.
Steps to reproduce:
The default timeout is 5 seconds, but you can provide a custom timeout with the -T argument:
As you can see, it takes roughly 700ms to execute the key scanning activities, but there is an additional delay directly influenced by the timeout value.
Cause:
After doing some investigations, I found the exact location in the ssh-keyscan source code where the delay takes place:
However, for this particular call to w32_select(), there were no events tracked and num_events remains 0. Therefore, all wait_for_any_event() can do is wait for the timeout.
Workaround
As I showed earlier with the steps to reproduce, we can provide a lower timeout value to lower the amount of useless waiting. One second is the lowest value we can give to ssh-keyscan. So even though we shaved off 4 seconds of useless waiting, there is still a full second of waiting which should just not be there.
Even if ssh-keyscan would accept sub-second timeout values, this would not be desirable since it limits the amount of jitter that is tolerated within a network before the actual timeout is triggered at other places in the code.
Bugfix patch
I have prepared a bugfix patch which just checks the I/O fds periodically when there are no events to track. This worked really well to get rid of the useless waiting, and doesn't influence other users of this method.
diff --git a/contrib/win32/win32compat/w32fd.c b/contrib/win32/win32compat/w32fd.c
index 01f59016..8457a4fa 100644
--- a/contrib/win32/win32compat/w32fd.c
+++ b/contrib/win32/win32compat/w32fd.c
@@ -835,7 +835,12 @@ w32_select(int fds, w32_fd_set* readfds, w32_fd_set* writefds, w32_fd_set* excep
debug4("select - timing out");
break;
}
- time_rem = timeout_ms - (ticks_spent & 0xffffffff);
+
+ /* just periodically check the fds when there are no events to listen for */
+ if (num_events == 0)
+ time_rem = 10;
+ else
+ time_rem = timeout_ms - (ticks_spent & 0xffffffff);
}
else
time_rem = INFINITE;
If this fix is considered acceptable I can raise a PR with this patch. However, I am not convinced that this would be a very structural solution to this problem. Ideally there would be I/O events to wait for by wait_for_any_event(), but I do not know how to achieve this.
There is a bug in this fork's version of
ssh-keyscan
where there is always an unnecessary delay of at leasttimeout
seconds before it actually starts processing and printing the results. This behavior is not present in the unix versions of the tool, nor is it present in the MinGW's Windows build of this tool.The bug is present in both
OpenSSH_for_Windows_8.1p1
as well as the latest pre-releaseOpenSSH_for_Windows_8.6p1
installed through Chocolatey.Steps to reproduce:
The default timeout is 5 seconds, but you can provide a custom timeout with the
-T
argument:With the default timeout:
With a 1 second timeout:
With a 10 second timeout:
As you can see, it takes roughly 700ms to execute the key scanning activities, but there is an additional delay directly influenced by the timeout value.
Cause:
After doing some investigations, I found the exact location in the
ssh-keyscan
source code where the delay takes place:Source
[openssh-portable/ssh-keyscan.c](https://github.com/PowerShell/openssh-portable/blob/75835a2462e1d8caf614cdb7011e45da929dc142/ssh-keyscan.c#L596-L598) ```c while (select(maxfd, r, NULL, e, &seltime) == -1 && (errno == EAGAIN || errno == EINTR || errno == EWOULDBLOCK)) ; ```The
select()
is macro'd tow32_select()
:Source
[openssh-portable/contrib/win32/win32compat/inc/sys/select.h](https://github.com/PowerShell/openssh-portable/blob/75835a2462e1d8caf614cdb7011e45da929dc142/contrib/win32/win32compat/inc/sys/select.h#L29-L31) ```c int w32_select(int fds, w32_fd_set * , w32_fd_set * , w32_fd_set * , const struct timeval *); #define select(a,b,c,d,e) w32_select((a), (b), (c), (d), (e)) ```The culprit is found in this snippet from
w32_select()
, where an async I/O is started on the selected fds and the relevant events are tracked:Source
[openssh-portable/contrib/win32/win32compat/w32fd.c](https://github.com/PowerShell/openssh-portable/blob/75835a2462e1d8caf614cdb7011e45da929dc142/contrib/win32/win32compat/w32fd.c#L772-L802) ```c /* * start async io on selected fds if needed and pick up any events * that select needs to listen on */ for (int i = 0; i < fds; i++) { if (readfds && FD_ISSET(i, readfds)) { w32_io_on_select(fd_table.w32_ios[i], TRUE); if ((fd_table.w32_ios[i]->type == SOCK_FD) && (fd_table.w32_ios[i]->internal.state == SOCK_LISTENING)) { if (num_events == SELECT_EVENT_LIMIT) { debug3("select - ERROR: max #events breach"); errno = ENOMEM; return -1; } events[num_events++] = fd_table.w32_ios[i]->read_overlapped.hEvent; } } if (writefds && FD_ISSET(i, writefds)) { w32_io_on_select(fd_table.w32_ios[i], FALSE); if ((fd_table.w32_ios[i]->type == SOCK_FD) && (fd_table.w32_ios[i]->internal.state == SOCK_CONNECTING)) { if (num_events == SELECT_EVENT_LIMIT) { debug3("select - ERROR: max #events reached for select"); errno = ENOMEM; return -1; } events[num_events++] = fd_table.w32_ios[i]->write_overlapped.hEvent; } } } ```Then later, a blocking wait is called which wakes either on any event, or when the timeout is reached:
Source
[openssh-portable/contrib/win32/win32compat/w32fd.c](https://github.com/PowerShell/openssh-portable/blob/75835a2462e1d8caf614cdb7011e45da929dc142/contrib/win32/win32compat/w32fd.c#L843-L844) ```c if (0 != wait_for_any_event(events, num_events, time_rem)) return -1; ```However, for this particular call to
w32_select()
, there were no events tracked andnum_events
remains 0. Therefore, allwait_for_any_event()
can do is wait for the timeout.Workaround
As I showed earlier with the steps to reproduce, we can provide a lower timeout value to lower the amount of useless waiting. One second is the lowest value we can give to
ssh-keyscan
. So even though we shaved off 4 seconds of useless waiting, there is still a full second of waiting which should just not be there.Even if
ssh-keyscan
would accept sub-second timeout values, this would not be desirable since it limits the amount of jitter that is tolerated within a network before the actual timeout is triggered at other places in the code.Bugfix patch
I have prepared a bugfix patch which just checks the I/O fds periodically when there are no events to track. This worked really well to get rid of the useless waiting, and doesn't influence other users of this method.
If this fix is considered acceptable I can raise a PR with this patch. However, I am not convinced that this would be a very structural solution to this problem. Ideally there would be I/O events to wait for by
wait_for_any_event()
, but I do not know how to achieve this.