aws-samples / alphafold-protein-structure-prediction-with-frontend-app

Other
15 stars 5 forks source link

AlphaFold2 Webapp on AWS

View this page in Japanese(日本語)

AlphaFold2 Webapp on AWS provides a web frontend that allows users to run AlphaFold2 or ColabFold using GUI. In addition, administrators can easily build an AlphaFold2 or ColabFold analysis environment with AWS CDK. For more information, please refer to an AWS HPC blog post: "Running protein structure prediction at scale using a web interface for researchers".

NOTE: On the frontend, there are two tabs, AlphaFold2 and ColabFold, each with a corresponding page. However, only one of them will actually work. If the HeadNode specified during frontend setup was AlphaFold2, only the AlphaFold2 page will work, and if it was ColabFold, only the ColabFold page will work.

Prerequisites for development environment

NOTE: We recommend that you follow the steps in the next section to set up your development environment.

Set up your development environment using AWS Cloud9

NOTE: We recommend you create your AWS Cloud9 environment in us-east-1 (N. Virginia) region.

NOTE: If you are going to create an AWS Cloud9 environment using the following commands, the prerequisites above (e.g. AWS CLI / Python / Node.js / Docker) are pre-configured at Cloud9.

  1. Launch AWS CloudShell and run the following command.
git clone https://github.com/aws-samples/cloud9-setup-for-prototyping
cd cloud9-setup-for-prototyping
  1. To assign an Elastic IP to Cloud9, edit the params.json file by vim params.json and change the attach_eip option to true.
  "volume_size": 128,
- "attach_eip": false
+ "attach_eip": true
}
  1. Launch Cloud9 environment cloud9-for-prototyping
./bin/bootstrap

NOTE: After the completion of the bootstrap process, the Elastic IP assigned to Cloud9 will be displayed on the screen. Copy this IP to keep it for future reference.

Elastic IP: 127.0.0.1 (example)
  1. Open the AWS Cloud9 console, and open an environment named cloud9-for-prototyping.
  2. On the menu bar at the top of the AWS Cloud9 IDE, choose Window > New Terminal or use an existing terminal window.
  3. In the terminal window, enter the following.
git clone https://github.com/aws-samples/alphafold-protein-structure-prediction-with-frontend-app.git
  1. Go to alphafold-protein-structure-prediction-with-frontend-app directory.
cd alphafold-protein-structure-prediction-with-frontend-app/

Deploy the application

NOTE: The following command uses us-east-1 (N. Virginia) region.

NOTE: We recommend that you use the Cloud9 environment for the following steps.

1. Backend

## Build the frontend CDK stack
cd app
npm install
npm run build
-const c9Eip = 'your-cloud9-ip'
+const c9Eip = 'xx.xx.xx.xx'
cd ../provisioning
npm install
npx cdk bootstrap
## Set up the network, database, and storage
npx cdk deploy Alphafold2ServiceStack --require-approval never
cd ../
Output:
Alphafold2ServiceStack.AuroraCredentialSecretArn = arn:aws:secretsmanager:us-east-1:123456789012:secret:AuroraCredentialSecretxxxyyyzzz
Alphafold2ServiceStack.AuroraPasswordArn = arn:aws:secretsmanager:us-east-1:123456789012:secret:AuroraPasswordxxxyyyzzz
Alphafold2ServiceStack.ExportsOutputRefHpcBucketxxxyyyzzz = alphafold2servicestack-hpcbucketxxxyyyzzz
Alphafold2ServiceStack.FsxFileSystemId = fs-xxxyyyzzz
Alphafold2ServiceStack.GetSSHKeyCommand = aws ssm get-parameter --name /ec2/keypair/key-xxxyyyzzz --region us-east-1 --with-decryption --query Parameter.Value --output text > ~/.ssh/keypair-alphafold2.pem
...
aws ssm get-parameter --name /ec2/keypair/key-{your key ID} --region us-east-1 --with-decryption --query Parameter.Value --output text > ~/.ssh/keypair-alphafold2.pem
## change the access mode of private key
chmod 600 ~/.ssh/keypair-alphafold2.pem

2. Set up a cluster managed by AWS ParallelCluster

## Install AWS ParallelCluster CLI
pip3 install aws-parallelcluster==3.7.2 --user
## Set the default region
export AWS_DEFAULT_REGION=us-east-1

## Generate a configuration file for a ParallelCluster cluster
npx ts-node provisioning/hpc/alphafold2/config/generate-template.ts

## Create a ParallelCluster cluster
pcluster create-cluster --cluster-name hpccluster --cluster-configuration provisioning/hpc/alphafold2/config/config.yml
For ColabFold
npx ts-node provisioning/hpc/colabfold/config/generate-template.ts
pcluster create-cluster --cluster-name hpccluster --cluster-configuration provisioning/hpc/colabfold/config/config.yml
pcluster list-clusters
Output:
{
  "clusters": [
    {
      "clusterName": "hpccluster",
      ## Wait until CREATE_COMPLETE 
      "cloudformationStackStatus": "CREATE_COMPLETE",
...

3. Web frontend

## Get the instance ID of the cluster's HeadNode
pcluster describe-cluster -n hpccluster | grep -A 5 headNode | grep instanceId
Output:
"instanceId": "i-{your_headnode_instanceid}",
-const ssmInstanceId = 'your-headnode-instanceid'
+const ssmInstanceId = 'i-{your_headnode_instanceid}'
-const allowIp4Ranges = ['your-global-ip-v4-cidr']
+const allowIp4Ranges = ['xx.xx.xx.xx/xx']
## Deploy the frontend CDK stack
cd ~/environment/alphafold-protein-structure-prediction-with-frontend-app/provisioning
npx cdk deploy FrontendStack --require-approval never
Output:
FrontendStack.ApiGatewayEndpoint = https://xxxyyyzzz.execute-api.us-east-1.amazonaws.com/api
FrontendStack.ApiRestApiEndpointXXYYZZ = https://xxxyyyzzz.execute-api.us-east-1.amazonaws.com/api/
FrontendStack.CloudFrontWebDistributionEndpoint = xxxyyyzzz.cloudfront.net

4. Launch a HeadNode in your cluster

## SSH login to ParallelCluster's HeadNode using private key
export AWS_DEFAULT_REGION=us-east-1
pcluster ssh --cluster-name hpccluster -i ~/.ssh/keypair-alphafold2.pem
bash /fsx/alphafold2/scripts/bin/app_install.sh
nohup bash /fsx/alphafold2/scripts/bin/setup_database.sh &
For ColabFold
bash /fsx/colabfold/scripts/bin/app_install.sh
sbatch /fsx/colabfold/scripts/setupDatabase.bth

5. Check if the backend works

## SSH login to ParallelCluster's HeadNode using private key
export AWS_DEFAULT_REGION=us-east-1
pcluster ssh --cluster-name hpccluster -i ~/.ssh/keypair-alphafold2.pem
tail /fsx/alphafold2/job/log/setup_database.out -n 8
Output:
Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
dcfd44|OK  |    66MiB/s|/fsx/alphafold2/database/pdb_seqres/pdb_seqres.txt

Status Legend:
(OK):download completed.
All data downloaded.
## Fetch the FASTA file of your choice (e.g. Q5VSL9)
wget -q -P /fsx/alphafold2/job/input/ https://rest.uniprot.org/uniprotkb/Q5VSL9.fasta

## Start the job using CLI
python3 /fsx/alphafold2/scripts/job_create.py Q5VSL9.fasta
For ColabFold
wget -q -P /fsx/colabfold/job/input/ https://rest.uniprot.org/uniprotkb/Q5VSL9.fasta
python3 /fsx/colabfold/scripts/job_create.py Q5VSL9.fasta
squeue
Output:
## While running a job
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 1 queue-cpu setupDat   ubuntu CF       0:03      1 queue-cpu-dy-x2iedn16xlarge-1

## Once all the jobs finished
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

6. Check if the frontend works

7. Clean up

When you are done trying out this sample, remove the resource to avoid incurring additional costs. Run the following commands from the Cloud9 terminal.

## Delete the dataset files from HeadNode
export AWS_DEFAULT_REGION=us-east-1
pcluster ssh --cluster-name hpccluster -i ~/.ssh/keypair-alphafold2.pem
rm -fr /fsx/alphafold2/database/
logout

## Delete the cluster
export AWS_DEFAULT_REGION=us-east-1
pcluster delete-cluster -n hpccluster
## Check the name of the CDK stacks (for frontend and backend) and destroy them
cd ~/environment/alphafold-protein-structure-prediction-with-frontend-app/provisioning
npx cdk list
npx cdk destroy FrontendStack
npx cdk destroy GlobalStack
npx cdk destroy Alphafold2ServiceStack