EPiCs-group / obelix

An automated workflow for generation & analysis of bidentate ligand containing complexes
https://epics-group.github.io/obelix/
GNU General Public License v3.0
1 stars 2 forks source link

Write the 2nd Lamda function that matches the data through Exp_ID #35

Open Selkubi opened 1 month ago

Selkubi commented 1 month ago

This function should get the data in a metadata format, matches it with the Exp_ID that is manually entered in the metadata and appends all the new information under a dictionary called "meta". This way we can separate dat and metadata when querying and this is especially helpful when we dont know the names of the attributes that will be used in the queries.

Selkubi commented 1 month ago

Problems and current solutions 1- The S3 bucket does not process UTF8 CSV as good as simple CSV (at least the ones coming out of my excel. Try to use always the csv(comma delimited) when converting as possible (this can not be fixed by defining type in the lambda function using open(file, mode='r', encoding='utf-8'))

Selkubi commented 1 month ago

Setting up the second lamda function attaching the metadata


Step 1: Create an S3 Bucket

  1. Go to the S3 Console: Open the AWS Management Console, then search for and select S3.
  2. Create a New Bucket:
    • Click Create bucket.
    • Enter a unique bucket name (e.g., s3-metadata-obelix).
    • Choose the appropriate region (e.g., eu-central-1).
    • Click Create bucket.

Step 2: Create a DynamoDB Table

  1. Go to the DynamoDB console:
  2. Check if the table which will be used for appending data has the correct data model:
    • The primary key is Exp_ID and sort key is Ligand# as a string or number depending on your use case.
    • Make note of the table name. this will be required in the role creation s to give the correct write access.

Step 3: Create an IAM Role for Lambda

  1. Go to the IAM Console.
  2. Create a New Role:
    • Select Lambda as the trusted entity type, then click Next: Permissions.
  3. Attach Policies:
    • Attach the following managed policies:
      • AmazonS3ReadOnlyAccess (for read access to the S3 bucket).
      • AmazonDynamoDBFullAccess (for read and write access to DynamoDB).
  4. Create a Custom S3 Policy for Restricted Access:
    • Open the IAM console in another tab and click 'policies' in the sidepanel
    • Click Create policy and switch to the JSON editor.
    • Use this policy to grant the Lambda function access to the specific S3 bucket:
      {
      "Version": "2012-10-17",
      "Statement": [
      {
          "Effect": "Allow",
          "Action": [
              "logs:CreateLogGroup",
              "logs:CreateLogStream",
              "logs:PutLogEvents"
          ],
          "Resource": "arn:aws:logs:*:*:*"
      },
      {
          "Effect": "Allow",
          "Action": [
              "s3:GetObject",
              "s3:ListBucket"
          ],
          "Resource": [
              "arn:aws:s3:::<s3-bucket-name>/*",
              "arn:aws:s3:::<s3-bucket-name>"
          ]
      },
      {
          "Effect": "Allow",
          "Action": [
              "dynamodb:*"
          ],
          "Resource": "arn:aws:dynamodb:eu-central-1:058264498638:table/<db-table-name>"
      },
      {
          "Effect": "Allow",
          "Action": [
              "dynamodb:ListTables",
              "dynamodb:Scan",
              "dynamodb:UpdateItem"
          ],
          "Resource": "arn:aws:dynamodb:eu-central-1:058264498638:table/<db-table-name>/*"
      }
      ]
      } 
    • Save the policy and attach it to the IAM role.
    • make sure to update the s3 bucket names and db table names as well as the correct arn numbers that belongs to the account making use of the lambda function
  5. Name the Role (e.g., lambda_import_csv) and click Create role.

Step 4: Create the Lambda Function

  1. Go to the Lambda Console.
  2. Create a New Lambda Function:
    • Click Create function.
    • Choose Author from scratch.
    • Enter a name (e.g., s3-metadata-obelix).
    • Select Python 3.8 or later as the runtime.
  3. Set Permissions:
    • Under Permissions, select Use an existing role and choose the IAM role created in Step 3 (lambda_import_csv).
    • Click Create function.
  4. Add Lambda Code:
    • Copy the provided Lambda function code into the editor, ensuring it references the correct S3 bucket (s3-metadata-obelix) and DynamoDB table (eg. obelixtest_sort_key).
    • Click Deploy.
  5. Publish the Function Version:
    • Go to Actions > Publish new version to ensure a versioned deployment of your Lambda function.

Step 5: Set up the S3 Trigger

  1. In the Lambda Function Configuration, go to Add trigger.
  2. Choose S3:
    • Select S3 as the trigger source.
    • Choose the bucket name (e.g., s3-metadata-obelix).
    • Set Event type to All object create events.
  3. Save the Trigger by clicking Add.

Step 6: Configure the S3 Bucket Policy

  1. Go to the S3 Bucket Permissions:
    • In the Permissions tab, select Bucket policy.
  2. Add the Bucket Policy:
    • Use the following JSON to allow the Lambda function to access the bucket. Replace YOUR_ACCOUNT_ID with the AWS Account ID and update the bucket name if necessary:
      {
      "Version": "2012-10-17",
      "Statement": [
      {
          "Effect": "Allow",
          "Principal": {
              "AWS": "arn:aws:iam::058264498638:role/lambda_import_csv"
          },
          "Action": [
              "s3:GetObject",
              "s3:ListBucket"
          ],
          "Resource": [
              "arn:aws:s3:::s3-metadata-obelix",
              "arn:aws:s3:::s3-metadata-obelix/*"
          ]
      }
      ]
      }    
  3. Save Changes.

Step 7: Test the Function

  1. Upload a Test CSV File to the S3 bucket (s3-metadata-obelix).
  2. Verify Lambda Execution:
    • Check CloudWatch logs to confirm that the function processed the file.
    • Verify that the expected data has been added to or updated in DynamoDB (obelixtest_sort_key table).
  3. You can also create a test event that is based on an existing file in the S3 bucket -copy the code below as a test event into your aws code editor
    {
    "Records": [
    {
      "s3": {
        "bucket": {
          "name": "s3-metadata-obelix"
        },
        "object": {
          "key": "metadata_all.csv"
        }
      }
    }
    ]
    }

    S3-to-dynamodb-sns-metadata-f37bb411-26a1-4833-909c-ee93bf358218.zip

Selkubi commented 2 weeks ago

To test querying from the DynamoDB table from your own environment, you need boto3. with that you can query according to any attribute or key. I've attached an example py script, ObelixNestedQuery.zip, to do simple queries. Make sure to write your own dynamodb Table name for the table variable and have the right authentications to be able to query from your IDE (for Vs code, you can use AWS CLI extension for this among others).