This repo contains (codified) instructions for deploying the eWaterCycle platform. The target audience of these instructions are system administrators. For more information on the eWaterCycle platform (and how to deploy it) see the eWaterCycle documentation.
For instructions on how to use the machine as deployed by this repo see the User guide.
These instructions assume you have some basic knowledge of vagrant and Ansible.
The hardware environment used by the eWaterCycle platform development team is the SURF Research Cloud. Starting a machine on the Surf Research Cloud requires that you have research budget with SURF, for more info see the website of SURF. Once running, access to the machine can be shared to anyone.
The setup instructions in this repo will create an eWaterCycle application(a sort-of VM template) that when started will create a machine with:
An application on the SURF Research cloud is provisioned by running an Ansible playbook (research-cloud-plugin.yml).
In addition to the standard VM storage, additional read-only datasets are mounted at /mnt/data
from dCache using rclone. They may contain things like:
Previously the eWatercycle platform consisted of multiple VM on SURF HPC cloud, see v0.1.2 release for that code.
Deploying a local test VM is mostly useful for developing the SURF Research Cloud applications. This vagrant setup creates a virtual machine with 8Gb memory, 4 virtual cores, and 70Gb storage. This should work on any Linux or Windows machine.
To set up an Explorer/Jupyter server on your local machine with vagrant and Ansible
Create config file research-cloud-plugin.vagrant.vars
with
---
dcache_ro_token: <dcache macaroon with read permission>
rclone_cache_dir: /data/volume_2
# Directory where /home should point to
alt_home_location: /data/volume_3
The token can be found in the eWaterCycle password manager.
vagrant --version
# Vagrant 2.4.1
vagrant plugin install vagrant-vbguest
# Installed the plugin 'vagrant-vbguest (0.32.0)'
vagrant up
Visit site
# Get ip of server with
vagrant ssh -c 'ifconfig eth1'
Go to http://<ip of eth1>
and login with vagrant:vagrant
.
You will get some complaints about unsecure serving, this is OK for local testing and this will not happen on Research Cloud.
WSL2 users should follow steps on https://www.vagrantup.com/docs/other/wsl.
Importantly:
export PATH="$PATH:C:\Program Files\Oracle\VirtualBox"
vagrant up --provider virtualbox
This chapter is dedicated for catalog item developers.
On the Research cloud the developer can add an catalog item for other people to use. The generic steps to do this are documented here.
For eWatercycle component following specialization was done
https://github.com/eWaterCycle/infra.git
as repository URLresearch-cloud-plugin.yml
as script pathmain
as tagdcache_ro_token
parameter for dcache read-only token aka macaroon.
The token can be found in the eWaterCycle password manager.
This token has an expiration date, so it needs to be updated every now and then.alt_home_location
parameter with value /data/volume_2
.
For mount point of the storage item which should hold homes mounted.rclone_cache_dir
parameter with value /data/volume_3
.
For directory where rclone can store its cache.rclone_max_gsize
with value 45
.
For maximum size of cache on rclone_cache_dir
volume. In Gb.https://github.com/eWaterCycle/infra
For eWatercycle catalog item following specialization was done
https://github.com/eWaterCycle/infra
SURF HPC Cloud
as cloud provider
co_irods
to false
as we do not use irodsco_research_drive
to false
as we do not use research driveWebinterface (https:)
,
so clicking on ACCESS
button will open up the eWatercycle experiment explorer web interfaceTo become root on a VM the user needs to be member of the src_co_admin
group on SRAM.
See docs.
This chapter is dedicated for application deployers.
ewatercycle-nlesc
For a new CO make sure
End user should be invited to CO so they can login.
See User guide to see what users have to do to login or use GitHub repository.
To get example notebooks end users should use following URL (with <workspace id>
with your currently running workspace)
https://<workspace id
>.workspaces.live.surfresearchcloud.nl/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FeWaterCycle%2Fewatercycle&urlpath=lab%2Ftree%2Fewatercycle%2Fdocs%2Fexamples%2FMarrmotM01.ipynb&branch=main</workspace
>
TODO add this link to home page of server at
This link uses nbgitpuller to sync a git repo and open a notebook in it.
This chapter is dedicated for application data preparer.
The eWatercycle system setup requires a lot of data files. For the Research cloud virtual machines we will mount a dcache bucket.
To fill the dcache bucket you can run
ansible-playbook \
-e cds_uid=1234 -e cds_api_key <cds api key> \
-e dcache_rw_token=<dcache macaroon with read/write permissions>
shared-data-disk.yml
Runnig this script will download all data files to /mnt/data and upload them to dcache.
The steps above fetch the data from original sources. If you want to sync some files from another location, say, Snellius, you can use rclone directly. In our experience, it works better to sync entire directories than to try and copy single files.
Create the file ~/.config/rclone/rclone.conf
and add the following content:
[ dcache ]
type = webdav
url = https://webdav.grid.surfsara.nl:2880
vendor = other
user =
pass =
bearer_token = <dcache macaroon with read/write permissions>
You can verify your access by running an innocent rclone ls dcache:parameter-sets
.
The command to sync directories is rclone copy somedir dcache:parameter-sets/somedir
.
Beware that this will overwrite any existing files, if different!
Note: password manager can be used for exchanging macaroons.
Create the file ~/.config/rclone/rclone.conf
and add the following content:
[dcache]
type = webdav
url = https://webdav.grid.surfsara.nl:2880
vendor = other
user =
pass =
bearer_token = <dcache macaroon with read permissions>
Install rclone and run following command to mount dcache at ~/dcache
directory.
mkdir ~/dcache
rclone mount --read-only --cache-dir /tmp/rclone-cache --vfs-cache-max-size 30G --vfs-cache-mode full dcache:/ ~/dcache
In ESMValTool config files you can use ~/dcache/climate-data/obs6
for rootpath:OBS6
.
In the eWaterCycle project we make Docker images. The images are hosted on Docker Hub . A project member can create issues here for permisison to push images to Docker Hub.
All services are running with systemd. Their logs can be viewed with journalctl
.
The log of the Jupyter server for each user can be followed with
journalctl -f -u jupyter-vagrant-singleuser.service
(replace vagrant
with own username)