Xing Liu (UC Berkeley) and Anthony Kremin (Berkeley Lab), June 2024
DESI's early data release (EDR) is available to the public, free of charge, at the desidata S3 cloud storage "bucket" on Amazon Web Services (AWS).
Here, we provide a Docker image which makes it easy to work with both local and cloud-hosted DESI data. Our Docker image is a self-contained Linux environment which comes pre-packaged with
Most DESI code developed for NERSC can run on this Docker image with little to no modifications.
\ You are free to choose a combination of local/cloud-hosted databases and local/cloud-hosted programming environments to suit your workflow.
If your DESI data is hosted locally, or if you want to stream the S3 DESI data to process locally, then please follow the instructions at Running the Docker image locally. We emphasize that local data processing is only practical for those with high-performance computers. Due to the high resolution of DESI data, you should only run the image locally if your computer has at least 16 GB of memory (24 GB recommended).
Otherwise, we recommend running the Docker image at your institution's computing center, or a commercial cloud computing center such as AWS Elastic Cloud Compute (EC2). A cloud compute instance gives you on-demand access to additional storage and processing power. AWS EC2, in particular, have a very high-bandwidth internal network integration with AWS S3. If you are interested, then please follow the instructions for Running the Docker image on an AWS EC2 cloud compute instance.
We will be using Docker Engine, Docker's command-line tool.
Open your computer terminal, and navigate to the folder you use as your workspace for DESI.
If your DESI data is locally hosted at local_data_path
, then enter this command:
docker run -it -p 8888:8888 -e DESI_RELEASE=edr \
--volume "$(pwd):/home/synced" \
--volume "local_data_path:/home/desidata:ro" \
ghcr.io/desihub/desidocker:main
:ro
at the end of the flag.Otherwise, to access the DESI data hosted at AWS S3, then enter this command instead:
docker run -it -p 8888:8888 -e DESI_RELEASE=edr \
--volume "$(pwd):/home/synced" \
--cap-add SYS_ADMIN --device /dev/fuse --security-opt apparmor:unconfined \
ghcr.io/desihub/desidocker:main
Once the image starts running, locate the line beginning with http://127.0.0.1:8888/lab?token=...
in the output, and open the address in your browser.
While you do not need an AWS account to access the DESI data locally, you do have to make one in order to use the AWS EC2 service. Follow the official instructions for First time users of AWS to get started. Once you’ve signed into your account, we recommend switching your region to us-west-2 (Oregon) as that is the region of our S3 bucket. Then, you can navigate to Services » EC2 to set-up a cloud compute instance.
To access the Jupyter web server provided by our Docker image, first we need to create a security group which allows HTTPS network access.
Navigate to Services » EC2 » Security groups, then click Create security group. Fill in the following fields —
Type | Protocol | Port range | Source type | Source | Description |
---|---|---|---|---|---|
Custom TCP | (TCP) | 8888 | My IP | (Your IP) | Open TCP port for Jupyter server |
HTTPS | (TCP) | (443) | My IP | (Your IP) | Allow HTTPS for Jupyter server |
SSH | (TCP) | (22) | My IP | (Your IP) | Allow SSH access to the instance |
Type | Protocol | Port range | Source type | Source | Description |
---|---|---|---|---|---|
All traffic | (All) | (All) | Anywhere-IPv4 | (0.0.0.0/0) | Allow instance to access the whole internet |
Then click Create security group.
Navigate to Services » EC2 » Instances, then click Launch instances. Fill in the following fields —
Then click Launch instance. After the instance has loaded, follow the official instructions to Connect to your instance.
Run the following lines to install Git and Docker on Amazon Linux, which uses the yum
package management system.
# Install Git and Docker
sudo yum update
sudo yum install git
sudo yum install docker
# Give Docker extra permissions
sudo usermod -a -G docker ec2-user
id ec2-user
newgrp docker
sudo systemctl enable docker.service
If you are using a different Linux distribution on your instance, refer to the official instructions to install Docker Engine for Linux instead.
Run this command to start Docker,
sudo systemctl start docker.service
Finally, run this shell command to download and run the image.
docker run -it -p 8888:8888 -e DESI_RELEASE=edr \
-e PUBLIC_IP=$(curl -s https://checkip.amazonaws.com) \
--volume "$(pwd):/home/synced" \
--cap-add SYS_ADMIN --device /dev/fuse --security-opt apparmor:unconfined \
ghcr.io/desihub/desidocker:main
unknown server OS
error, you may need to restart Docker.Once the image starts running, locate the line beginning with http://...:8888/lab?token=...
in the output, and open the address in your browser.
$DESI_ROOT
to another public data release,
replace edr
with the other release's name in the -e DESI_RELEASE=edr
flag.8888
in -p 8888:8888
.
Adjust the external port (as well as the port security policy if using EC2) should you encounter port collision issues.$(pwd)
(which points to the folder where you entered the docker run
command)
with the absolute path to the custom folder in the --volume "$(pwd):/home/synced"
flag.docker build github.com/desihub/desidocker.git --tag desi-docker
Then, replace the tag ghcr.io/desihub/desidocker:main
with desi-docker
when running the image.
To update your Docker image, run
docker pull ghcr.io/desihub/desidocker:main
See maintainance.md.