The following came up during a discussion and I am placing the info here (not something we need in the first iteration).
What are the waiting time for jobs to start due to data availability
Downloading from the internet
Copying every time from persistent to staging
If the data is being downloaded then if we have a list of highly access URLs, we can cache them if possible. e.g Images from a public site as test data, Genome sequences
This has interest for the security team as well. e.g. is anyone mining bitcoins
In the case of copying we can optimize this as well
Use faster network
Site replication. We have this for NIRD, e.g. if a user is running jobs on Fram they should use the Tromsø NIRD not the Trondheim one
The following came up during a discussion and I am placing the info here (not something we need in the first iteration).