Closed jhollowe closed 1 year ago
Sadly, when remote mounts/shares or iSCSI paths disappear, I/O is hung so never completes. Often even dt cannot exit cleanly as well, as kernel I/O rundown cannot cancel the request. The dt hung I/O detection is performed by the monitoring thread, and as written today does not provide a method to report total statistics, sorry. Even if a hook was added prior to attempting to terminate the thread, the per pass statistics will need to be accumulated before reporting the totals. I'll review.
Please try to following changes on a hung NFS mount: (I am unable to verify this myself)
robin@LAPTOP-BJH5MV95 ~/GitHub/dt $ diff -cw dt.c-orig dt.c *** dt.c-orig 2023-06-09 10:01:57.428606800 -0700 --- dt.c 2023-08-18 10:26:33.784023500 -0700
* 1,6 ** /****
* 31,36 ** --- 31,41 ----
* 7809,7814 ** --- 7814,7823 ---- } else if (dip->di_history_dumping == True) { Wprintf(dip, "History is being dumped, so not cancelling thread!\n"); } else {
robin@LAPTOP-BJH5MV95 ~/GitHub/dt $
Please let me know if this works, then I'll check this in and update the dt version, thanks!
hmmm... the code changes may have been munged in the comments, please advise if you need the updated dt.c
This change looks good! thanks for working on this!
Thanks John, I'll check in this change and bump the version accordingly.
Cheers, Robin
On Thu, Sep 7, 2023 at 10:42 AM John Hollowell @.***> wrote:
This change looks good! thanks for working on this!
— Reply to this email directly, view it on GitHub https://github.com/RobinTMiller/dt/issues/9#issuecomment-1710547322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG6RMDJIGQJAPQECXCCBAWDXZIBP3ANCNFSM6AAAAAA3PZLZ64 . You are receiving this because you commented.Message ID: @.***>
Hey Robin, just wanted to check when you think you will be able to commit this?
These changes were pushed at this time, thanks!
Commits on Sep 14, 2023 Merge branch 'master' of github.com:RobinTMiller/dt @RobinTMiller RobinTMiller committed 3 weeks ago Report total statistics before cancelling hung threads. @RobinTMiller RobinTMiller committed 3 weeks ago
When running dt against an NFS mount (see the full CLI used below), if the NFS mount goes offline/unresponsive during the test, it causes the thread running the test to hang and eventually be killed by a timeout. This causes the Total Stats information that should have been printed by that thread to not be output as the thread has been killed.
Is there a way to still allow the thread to print out the stats? Or hand off the needed information to another thread so it can print off the stats?
dt version:
25.03
OS:linux (RHEL)
CLI line:
dt enable=noprog,microdelay noprogt=1 iobehavior=dt flags=direct bs=random pattern=incr of=/t/vol_nfs_1/dt_1689757132 runtime=300 rdelay=40000 wdelay=40000 limit=4636024847 dispose=delete iotype=random
Log demonstrating error: