Closed cmazzoli closed 7 years ago
After Dan and Matt analysis showing that it was a sqlite problem, Matt solved it by un-mounting and mounting back the NFS. For us, it is necessary to reboot the workstation. Thanks to you all.
Debugged with Claudio and Dan at the beamline...
Eventually narrowed it down to sqlite hanging on db insert on run engine history databases in nfshome:/home/xf23id-ws3 (but worked fine on both adjacent workstations).
My theory is the nfs client somehow got into some wonky state where it wasn't able to do file locking. Odd behavior in some other apps and when trying to close apps and logout sorta supports the theory.
Logged out of xf23id1, logged in as root (on text console), killed all remaining processes using nfshome, restarted nfs-common and autofs services, and all seems well... If we see this error again, a reboot is sadly the best fix. Andi said they've run into this before and a reboot fixed it.
And digging through the logs I see it happened a few times on June 17.
I'll do a bit more research to try to find out if this is a known issue (with a known fix).
For those finding this through the nsls2 trac issue# 1260, note also the matching github issue (https://github.com/nsls-ii/bug-reports/issues/172).
I think we could close the github issue, this seems clearly more a sysadmin issue.
-matt
I lost all the information stored in history apparently... like scan numbers and so on. Could somebody help on this please??
You could try restoring the backup .sqlite files? Looking @ the file size will probably tell you if any have useful contents.
----- Reply message ----- From: "cmazzoli" notifications@github.com To: "NSLS-II/Bug-Reports" Bug-Reports@noreply.github.com Cc: "Cowan, Matt" cowan@bnl.gov, "Comment" comment@noreply.github.com Subject: [NSLS-II/Bug-Reports] Disk I/O error from BSUI (#172) Date: Wed, Jul 5, 2017 6:36 PM
I lost all the information stored in history apparently... like scan numbers and so on. Could somebody help on this please??
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/NSLS-II/Bug-Reports/issues/172#issuecomment-313245802, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOf4xJBq70oUYk32_GGL1mD-PF6mbZeks5sLA_ugaJpZM4OOr93.
Thanks. Mmmm.. I reentered by hand the last scan_id we thought we did on the beamline but I was wondering if anything else is missing that I haven't spotted yet. Any way of checking?
The transient scan_id is the only thing the RunEngine
automatically puts in / updates. Everything else user meta-data and iirc is set by the start-up scripts.
One way to check is to look at what is in
db[-1].start
(or maybe a bit further back) to check for the scan_id and for the other meta-data.
The current setup of the 'collection' ipython profile for 'xf23id1' depends on 2 different nfs servers (nfshome and nfsapp), bouncing between them 3 times via symlinks, with two autofs mount points involved for good measure!
I love the Tom and Jerry episodes where Tom builds Rube Goldberg mousetraps, but... https://www.google.com/search?q=tom+and+jerry+rube+goldberg+mousetrap&tbm=vid
and cleaning up all the vestigial cruft.
-matt
[xf23id1@xf23id-ws3 ~]$ pwd /home/xf23id1
[xf23id1@xf23id-ws3 ~]$ df /home Filesystem 1K-blocks Used Available Use% Mounted on auto.home 0 0 0 - /home
[xf23id1@xf23id-ws3 ~]$ df /home/xf23id1 Filesystem 1K-blocks Used Available Use% Mounted on 10.0.137.54:/home/xf23id1 19314530304 15737658376 3576871928 82% /home/xf23id1
[xf23id1@xf23id-ws3 ~]$ host 10.0.137.54 54.137.0.10.in-addr.arpa domain name pointer nfshome.cs.nsls2.local.
[xf23id1@xf23id-ws3 ~]$ ls -ald .ipython drwxr-xr-x 6 xf23id1 121 Oct 6 2015 .ipython [xf23id1@xf23id-ws3 ~]$ ls -ald .ipython/*collection lrwxrwxrwx 1 xf23id1 41 Sep 24 2015 .ipython/profile_collection -> /epics/xf/23id/ipython/profile_collection
[xf23id1@xf23id-ws3 ~]$ df /epics/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/md2 384233532 20272936 344419556 6% /
[xf23id1@xf23id-ws3 ~]$ ls -ald /epics drwxr-xr-x 3 root 4.0K Jan 22 2015 /epics [xf23id1@xf23id-ws3 ~]$ ls -ald /epics/xf drwxr-xr-x 2 root 4.0K Jan 22 2015 /epics/xf [xf23id1@xf23id-ws3 ~]$ ls -ald /epics/xf/23id lrwxrwxrwx 1 root 14 Jan 22 2015 /epics/xf/23id -> /direct/xf23id
[xf23id1@xf23id-ws3 ~]$ df /direct/ Filesystem 1K-blocks Used Available Use% Mounted on auto.direct 0 0 0 - /direct
[xf23id1@xf23id-ws3 ~]$ ls -ald /direct/ drwxr-xr-x 3 root 0 Jul 5 15:18 /direct/ [xf23id1@xf23id-ws3 ~]$ ls -ald /direct/xf23id drwxr-xr-x 9 root 120 Aug 8 2016 /direct/xf23id
[xf23id1@xf23id-ws3 ~]$ df /direct/xf23id Filesystem 1K-blocks Used Available Use% Mounted on 10.0.137.52:/xf/23id 8781121536 7753373024 1027748512 89% /direct/xf23id
[xf23id1@xf23id-ws3 ~]$ host 10.0.137.52 52.137.0.10.in-addr.arpa domain name pointer nfsapp.cs.nsls2.local.
[xf23id1@xf23id-ws3 ~]$ ls -ald /direct/xf23id/ipython drwxrwsrwx 29 xf23id1 4.0K Jul 5 14:53 /direct/xf23id/ipython [xf23id1@xf23id-ws3 ~]$ ls -ald /direct/xf23id/ipython/profile_collection lrwxrwxrwx 1 xf23id1 50 Mar 5 2016 /direct/xf23id/ipython/profile_collection -> /home/xf23id1/xf23id1_profiles/profile_collection/
[xf23id1@xf23id-ws3 ~]$ ls -ald /home/xf23id1 drwxr-xr-x 97 xf23id1 28K Jul 6 10:21 /home/xf23id1 [xf23id1@xf23id-ws3 ~]$ ls -ald /home/xf23id1/xf23id1_profiles drwxrwxr-x 4 xf23id1 42 Mar 5 2016 /home/xf23id1/xf23id1_profiles [xf23id1@xf23id-ws3 ~]$ ls -ald /home/xf23id1/xf23id1_profiles/profile_collection
[xf23id1@xf23id-ws3 ~]$ find /home/xf23id1/xf23id1_profiles/profile_collection -xdev -type l [xf23id1@xf23id-ws3 ~]$