Closed chrisbarnettster closed 4 years ago
looking at ceph mount. There is nothing in the mount point.
ceph mount available in slurm but not on galaxy frontend. tried a remount (didn't work) tried an unmount (didn't work because galaxy is busy here) in ceph logs:
Mar 6 01:36:19 galaxy-compchem kernel: [2976630.503821] ceph: mds0 hung
Mar 6 01:36:19 galaxy-compchem kernel: [2976630.748057] ceph: mds1 caps went stale, renewing
Mar 6 01:36:19 galaxy-compchem kernel: [2976630.748060] ceph: mds1 caps stale
Mar 6 01:36:24 galaxy-compchem kernel: [2976635.838383] ceph: mds0 came back
Mar 6 01:36:24 galaxy-compchem kernel: [2976635.838386] ceph: mds0 caps went stale, renewing
Mar 6 01:36:24 galaxy-compchem kernel: [2976635.838387] ceph: mds0 caps stale
Mar 6 01:36:24 galaxy-compchem kernel: [2976636.171932] libceph: mds0 10.102.25.18:6800 socket closed (con state OPEN)
Mar 6 01:36:29 galaxy-compchem kernel: [2976640.542578] libceph: mds0 10.102.25.18:6800 connection reset
Mar 6 01:36:29 galaxy-compchem kernel: [2976640.548713] libceph: reset on mds0
Mar 6 01:36:29 galaxy-compchem kernel: [2976640.548715] ceph: mds0 closed our session
Mar 6 01:36:29 galaxy-compchem kernel: [2976640.548717] ceph: mds0 reconnect start
Mar 6 01:36:29 galaxy-compchem kernel: [2976640.552810] ceph: mds0 reconnect denied
Mar 6 01:36:47 galaxy-compchem kernel: [2976658.413893] libceph: mds1 10.102.25.28:6801 socket closed (con state OPEN)
Mar 6 01:36:54 galaxy-compchem kernel: [2976665.897858] libceph: mds1 10.102.25.28:6801 connection reset
sudo supervisorctl stop galaxy
umount /cchem
# force kill of old galaxy tools conda installs - kill $JOBID
umount /cchem
mount /cchem # it works.
sudo supervisorctl start galaxy
up
Browse to webserver. Internal server error message displays.
perm denied issues in filesystem for tools conf TopLevelLookupException: Cant locate template for uri '/js-app.mako' WSGI crashed