Data Accelerator: Creates a burst buffer from generic hardware and integrates it with Slurm https://www.hpc.cam.ac.uk/research/data-acc http://www.stackhpc.com
It looks like I can hit a 5 minute timeout after about 1 minute (assuming it is waiting for the mount to complete and not from the start of the session; in that case about 2 minutes). Here is some relevant information from the logs:
Jan 10 09:50:31 dac-e-16 dacd[8349]: starting action {Uuid:2fd7d0cb-94ff-4794-b8ce-dc1d3aa72365 Session:{Name:6153 [snip]
Jan 10 09:51:25 dac-e-16 dacd[8349]: Mount for: 6153
Jan 10 09:51:25 dac-e-16 dacd[8349]: Mounting 6153 on host: dac-e-16 for session: 6153
Jan 10 09:51:25 dac-e-16 dacd[8349]: SSH to: dac-e-16 with command: mkdir -p /mnt/dac/6153_job
Jan 10 09:51:25 dac-e-16 dacd[8349]: Completed remote ssh run: mkdir -p /mnt/dac/6153_job
Jan 10 09:51:25 dac-e-16 dacd[8349]: SSH to: dac-e-16 with command: mount -t lustre -o flock,nodev,nosuid dac-e-16-opa@o2ib1:/xpgSIqqB /mnt/dac/6153_job
Jan 10 09:52:25 dac-e-16 dacd[8349]: Time up, waited more than 5 mins to complete.
Jan 10 09:52:25 dac-e-16 dacd[8349]: Error in remote ssh run: 'mount -t lustre -o flock,nodev,nosuid dac-e-16-opa@o2ib1:/xpgSIqqB /mnt/dac/6153_job' error: signal: killed
It looks like I can hit a 5 minute timeout after about 1 minute (assuming it is waiting for the mount to complete and not from the start of the session; in that case about 2 minutes). Here is some relevant information from the logs: