Data access bottlenecks

The following came up during a discussion and I am placing the info here (not something we need in the first iteration).

What are the waiting time for jobs to start due to data availability
- Downloading from the internet
- Copying every time from persistent to staging
If the data is being downloaded then if we have a list of highly access URLs, we can cache them if possible. e.g Images from a public site as test data, Genome sequences
- This has interest for the security team as well. e.g. is anyone mining bitcoins
In the case of copying we can optimize this as well
- Use faster network
- Site replication. We have this for NIRD, e.g. if a user is running jobs on Fram they should use the Tromsø NIRD not the Trondheim one

NAICNO / Jobanalyzer