This repo contains the files for Project 4 for DSBA 6190 (Intro to Cloud Computing) at UNC Charlotte for the Spring 2020 semester. The goal of this project was to construct a serverless process flow using Amazon Lambda functions. Furthermore, this project also involved the integration of several different AWS services, including S3 (storage), DynamoDB (non-relational database), SQS (messaging), and Comprehend (natural language processing).
The following diamgram outlines the general process flow, with all of the AWS components represented. Created with CloudCraft.co
As precurser to the process flow functioning a table was populated in DynamoDB. The table was simple, just six entries of a single variable. For this project I chose well known 100+ mile Ultra races, but the actual content is not important. All that is necessary is that the items in the DyanmoDB have a Wikipedia entry, and that the entry format in the DynamoDB table matches the Wikipedia entry. We'll discuss why when breaking down specific lambda function actions.
The core of this pipeline is the two lambda functions. For this project they were referred to as Producer and Consumer functions, respectively.
The Producer function read data into the pipeline and then send out a SQS message, stating what had been read. In this case, the function reads row by row from a DynamoDB table.
Note: While not documented in the repo, in the AWS pipeline the function is triggered by a CloudWatch Event setup to activate once a minute.
The Consumer function reads the SQS message sent by the Producer function, processes information contained in the message, and then saves the processed data as a CSV file in a designated S3 bucket.
To process the data, the function extracts the body of the SQS event payload received by the lambda function. In this case, the event body has the following structure, in JSON format : {Race:
Note: While not documented in the repo, in the AWS pipeline the function is triggered by a SQS event setup to read the SQS queue being populated by the Procedure function. The Consumer function activates upon reading a new SQS message.
To clearly show the pipeline results, the following is the input and output for the Vermont 100 Mile Endurance Run, one of the items in the initial DynamoDB database. As you can see, AWS comprehend is able to detect and identify type for the entites in the first sentance of the Wikipedia Entry.
https://en.wikipedia.org/wiki/Vermont_100_Mile_Endurance_Run (Accessed on 3/17/2020)