Write a Lamda function that is triggered when data is uploaded to the S3 bucket

EPiCs-group / obelix

An automated workflow for generation & analysis of bidentate ligand containing complexes

GNU General Public License v3.0

0 stars 1 forks source link

Write a Lamda function that is triggered when data is uploaded to the S3 bucket #28

Open Selkubi opened 2 months ago

Selkubi commented 2 months ago

The function must be triggered when S3 bucket receives new data. And should do the following;

Get the uploaded csv
Parse the csv into its columns
Create a unique experiment id since every new upload is a new experiment.
Sends an email with the uploaded file info and the matching new experiment id

The metadata will be processed the same way once the experiment id is emailed to the user. The user must put the experiment id into the metadata file so that the metadata and the experiment can be matched when they are in DynamoDB

Selkubi commented 2 months ago

Current status: The lambda function Magno sent works with the S3Access tests but the setup does not work when data is actually uploaded to the S3 bucket.

Selkubi commented 1 month ago

The function no.1 now works by itself with the tests but the access error is persistent. Magno is looking into that.

Selkubi commented 1 month ago

Status Update: The function is working up until the "+,+,+_octant" column. Here there need to be a naming convention agreement. The original csv files comes with "+,+,+_octant". But, when I convert the the example_dataset xlsx file @akalikadien sent to a csv, this double quote is deleted and delimiter is switched to ";" (this is spesific to my pc maybe?) But then, I can take in the column names without any problems.

So the question is should I proceed with the "+,+,+_octant" column name and "," as the delimiter or just switch to +,+,+_octant as column name and ";" as the delimiter?

Selkubi commented 1 month ago

Some notes on what to consider after the handover of the function

For now I have selected to go for the '+,-,+_octant' notation without the quotation marks. Update the function if need be according to the actual output data.
Most of the columns that I see filled with numbers have a 'or 0' or ' or None' argument. This means if there is no data in that cell, a 0 or None is written. This might not be necessary. Decide per variable.
The variables which does have missing data are enabled to not have this in data (no Na, or None, or 0's are installed.) For now I have decided which variables are like so. Please update the function once these are decided upon. This is also true for variables I set as "can have missing data". Check both
Check the format of the DB table to see if it fits the needs. For now it only has Ligand # as the partition key. Check if there is a need for a sort key as well among the variables.

Selkubi commented 1 month ago

I have completed the conversion of the first lambda function (attached as txt since .py is not accepted). For now it passes a test event of "clean_Rh_ligand_NBD_DFT_descriptors_v9.csv" file upload. aws_S3_to_dynamodb_function.txt