Closed johnthepink closed 7 years ago
The plan for now is this:
@johnthepink Are you using EE to trigger lambda?
@delianides the thought is to use S3 to trigger it
@delianides Lambda lets you trigger functions when something is added to an S3 bucket, so the idea is to watch the bucket that the raw files are being uploaded to by EE, and trigger the lambda function that way
Ok, good. Thats what I thought you meant. Just wanted to be sure.
@johnthepink 😽
SELECT
d.entry_id,
c.channel_id,
d.field_id_675 as audio,
d.field_id_1554 as sermon_audio
FROM
exp_channel_data as d
LEFT JOIN
exp_channels as c
ON d.channel_id = c.channel_id
WHERE
(c.channel_id = 3 OR c.channel_id = 61) AND d.field_id_1554 = '';
c.channel_id = 3
is newspring sermons
c.channel_id = 61
is fuse sermons
Data Processing for Heighliner
Heighliner currently functions really well as a uniform way to access data from all of our data stores. However, sometimes it will be necessary to make additional modifications to data beyond resolving it to a predictable schema. These types of modifications are additions or supplements to the existing data stores that are necessary for applications querying Heighliner, but are not essential to the data stores/applications being queried.
There are currently two cases we need to handle for data processing, both of which I think we can leverage AWS Lambda functions for.
Updating
Updating an existing data store shouldn't require Heighliner's direct involvement. This type of processing is needed when a new requirement is added to an existing data store, and old data needs to be updated to meet the new requirement. This should come in the form of basically a script that runs one time to do handle updating all old data entries.
Triggering
There may also be instances where putting a new requirement on an existing data store is not possible/ideal. In this case, Heighliner itself will need to trigger the processing of the data, and handle the logic behind providing this data to the requesting application. This processing should not block the response to the application, but may return data that is not ideal or incomplete initially. That way the application can go about it's business. The processing should also be done external to the Heighliner application, so as not to consume resources allocated to Heighliner's main concern of responding to client requests.
Triggering this type of processing may look like this:
Actions can probably be handled using AWS Lambda functions. This will require adding a way to trigger these actions to Heighliner. After the Lambda function finishes, it should make the new data generated available. Because we are assuming that updating the existing data store isn't possible/ideal, we will need another way to make this data available. This may be a good case for a Mongo collection. Files will need to be uploaded to S3, but querying S3 directly doesn't sound idea. So, files will probably need to live in S3 with pointers in Mongo.
Use Cases
Sermon Audio
I think updating the existing data store is the best solution for this case. We will need to:
Adding the new field should be trivial, but should be done first. Writing the SQL can be done in isolation, and we probably already have that somewhere. I'm not sure where this script needs to be run. We shouldn't need to wait for each lambda function to finish before running the next, so I think we can run the script and trigger the lambda functions all at once from a local machine. It would be nice if this were able to handle our future needs for updating EE, as well. So, maybe set up a small node program running Sequelize, and triggers the Lambda function over the AWS API. Running this program may be like:
The Lambda Function should be able to handle the influx of requests without issue. We can write this in Node, test it locally, and deploy it. Resources:
Image Processing
We need to provide compressed images optimized for mobile and other sizes. Expresssion Engine has a method for providing compressed images, but it lives at the template layer of the application. These images are uploaded to S3 using a naming convention, and are not stored in the database anywhere. We could get in to the business of listing and searching S3 for these images, but that feels dirty to me.
For this case, I think we could go the route of updating, triggering, or a combination. It really depends on when we want the images generated, and what we want to trigger the processing.
Possible places for triggering image processing:
On expression engine upload
Triggering image processing on expression engine upload would require tying in to the CEImage plugin. From there, we could take the necessary steps to produce the images we need by triggering a Lambda function, uploading to S3, and storing in Mongo. This will require us to back process our existing images. I don't think this is the best place to trigger the action.
On image added to S3 bucket
Lambda allows the ability to trigger an action when something is added to an S3 bucket. So, we could watch a bucket for images added, and then trigger a Lambda function to process the images, upload to S3, and store in Mongo. Taking this route will require us to back process our images.
When Heighliner doesn't receive necessary images
When Heighliner receives a request for images, and determines it doesn't have the best images, we could trigger a Lambda function at that point, upload to S3, and store in Mongo. This would not require back processing, which I'm not sure is good or bad at this point.