catalyst / moodle-fileconverter_librelambda

A Libre Office document converter for Moodle leveraging AWS Lambda
https://moodle.org/plugins/fileconverter_librelambda
Other
23 stars 18 forks source link
aws-architecture aws-infrastructure conversion moodle php supports-moodle

GitHub Workflow Status (branch)

Libre Lambda Document Converter

This is a file converter plugin for the Moodle (https://moodle.org) Learning Management System (LMS). The primary function of this plugin is to convert student submissions into the PDF file format, to allow teachers to use the annotate PDF functionality of Moodle.

More information on the annotate PDF function of Moodle can be found:

https://docs.moodle.org/36/en/Using_Assignment#Annotating_submissions

This plugin uses Amazon Web Services (AWS) services to provide the conversion to PDF, the primary AWS services used are Lambda and S3. The plugin interfaces Moodle with the AWS services. Everything you need to setup both Moodle and AWS is included in this plugin.

The aims of this plugin are to:

The following sections outline the steps that need to be followed to install the plugin, setup Moodle and the AWS architecture to enable document conversion. The installation and setup process has the following steps:

  1. Plugin Installation
  2. Moodle Setup
  3. AWS Stack setup
  4. Plugin Setup

Supported Moodle Versions

This plugin currently supports Moodle:

Plugin Installation

The following steps will help you install this plugin into your Moodle instance.

  1. Clone or copy the code for this repository into your Moodle instance at the following location: <moodledir>/files/converter/librelambda
  2. This plugin also depends on local_aws get the code from https://github.com/catalyst/moodle-local_aws and clone or copy it into <moodledir>/local/aws
  3. Run the upgrade: sudo -u www-data php admin/cli/upgrade

Note: the user may be different to www-data on your system.

Once the plugin is installed, next the Moodle setup needs to be performed.

Note: It is recommended that installation be completed via the command line instead of the Moodle user interface.

Moodle setup

The following steps are required to setup PDF annotation in Moodle.

Enable Annotation

PDF Annotation needs to be enabled at site level, for your Moodle installation. To do this:

  1. Log into the Moodle UI as a site administrator
  2. Navigate to the server system path settings: Site administration > Plugins > Activity modules > Assignment > Feedback plugins > Annotate PDF
  3. Make sure the Enabled by default check box is checked
  4. Click Save changes

Set Ghostscript Executable

Moodle uses Ghostscript (https://www.ghostscript.com/) to annotate the PDF files themselves. To use PDF Annotation your Moodle instance must be able to reach the Ghostscript executable. To do this:

  1. Log into the Moodle UI as a site administrator
  2. Navigate to the System path settings: Site administration > Server > System paths
  3. Enter in the path to the Ghostscript executable in the Path to ghostscript setting text box.
  4. Click Save changes

Note: In some Moodle installations setting system paths is disabled. You may need to contact your system administrator or Moodle vendor to have this value set.

Enable Document Converter

The Libre Lambda document converter must be enabled in Moodle before it can be used to convert documents. To do this:

  1. Log into the Moodle UI as a site administrator
  2. Navigate to the Manage document converter settings: Site administration > Plugins > Document converters > Manage document converters
  3. Click the enable eye icon in the table row that corresponds to: Libre Lambda Document Converter

Before the converter can be used the required AWS infrastructure needs to be setup. This is covered in the next section.

AWS Stack Setup

Binaries and scripts required for the stack are kept in the separate repository - https://github.com/catalyst/moodle-fileconverter_librelambda-aws_stack . The provision script will try to check it out.

The following steps will setup the Amazon Web Services (AWS) infrastructure. The AWS infrastructure is required to do the actual conversion of documents into PDF. While setting up the AWS infrastructure is largely automated by scripts included in this plugin, a working knowledge of AWS is highly recommended.

For more information on how the submitted files are processed in AWS please refer to the topic: Conversion Architecture

This step should be completed once the plugin has been installed into your Moodle instance and the other Moodle setup tasks have been completed.

Note: Full support on setting up an AWS account and API access keys for AWS stack infrastructure provisioning is beyond the scope of this guide.

To setup the AWS conversion stack infrastructure:

  1. Create an AWS account, see: https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/ for information on how to do this.
  2. Create an AWS API user with administrator access and generate a API Key ID and a API Secret Key, see: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html for information on how to do this.
  3. Change to your Moodle instance application directory. e.g. cd /var/www/moodle
  4. Run the provisioning script below, replacing <keyid> and <secretkey> With the AWS API Key ID and AWS API Secret Key that you obtained in step 2.
    Replace <region> with the AWS region you wish to set up your AWS stack, e.g. ap-southeast-2. The list of regions available can be found here: https://docs.aws.amazon.com/general/latest/gr/rande.html#lambda_region
    The command to execute is:
sudo -u www-data php files/converter/librelambda/cli/provision.php \
--keyid=<keyid> \
--secret=<secretkey> \
--region=<region> \
--set-config

Note: the user may be different to www-data on your system.

The --set-config option will automatically set the plugin settings in Moodle based on the results returned by the provisioning script.

The script will return output similar to, the following:

== Provisioning the Lambda function and stack resources ==
Stack status: CREATE_IN_PROGRESS
Stack status: CREATE_IN_PROGRESS
Stack status: CREATE_IN_PROGRESS
Stack status: CREATE_COMPLETE
Cloudformation stack created. Stack ID is: arn:aws:cloudformation:ap-southeast-2:693620471840:stack/LambdaConvert/4d609630-2760-11e9-b6a5-02181cf5d610

== Converter params ==
S3 user access key: AKIAxxxxxxxxxxxxxxxx
S3 user secret key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Input Bucket: xxxxxxxxxxxxxxxxxxxxxxxx-input
Output Bucket: xxxxxxxxxxxxxxxxxxxxxxxx-output
== Setting plugin configuration in Moodle, from returned settings. ==

What is created in AWS land:

Multiple stacks

The provisioning script creates a stack with the default name of LambdaConvert. If you need to give it a different name, or want multple stacks, there's --stack-name option, eg:

sudo -u www-data php files/converter/librelambda/cli/provision.php \
--keyid=<keyid> \
--secret=<secretkey> \
--region=<region> \
--stack-name=LambdaConvertTest
--set-config

Updating (reprovisioning)

Running the provision script again will replace (reprovision) the stack.

In order to avoid accidental overwriting, --replace-stack option must be given when updating:

sudo -u www-data php files/converter/librelambda/cli/provision.php \
--keyid=<keyid> \
--secret=<secretkey> \
--region=<region> \
--set-config

Stack status: CREATE_COMPLETE
Stack exsists and replacement not requested.
If you want to replace the stack use "--replace-stack" option

...

sudo -u www-data php files/converter/librelambda/cli/provision.php \
--keyid=<keyid> \
--secret=<secretkey> \
--region=<region> \
--replace-stack \
--set-config

== Provisioning the Lambda function and stack resources ==
...

Stack removal

Stack can be remoced with --remove-stack option.

sudo -u www-data php files/converter/librelambda/cli/provision.php \
--keyid=<keyid> \
--secret=<secretkey> \
--region=<region> \
--remove-stack

Stack status: CREATE_COMPLETE
Do you really want to remove "LambdaConvert" stack? [Type "yes" to confirm]:

yes[ENTER]

Stack status: DELETE_IN_PROGRESS
Removed

Common errors

Removing non-empty bucket

Buckets that are not empty cannot be removed. This error may occur when removing stack, or updating stack - sometimes stack won't update, in which case we do remove/create.

This error will be visibly reported.

Plugin Setup

Once the AWS stack infrastructure setup has been completed, next the Libre Lambda converter plugin in Moodle needs to be configured.

Note: These steps only needs to be completed if you did not use the --set-config option when running the AWS stack setup provisioning script. otherwise the plugin will be setup, you can use the steps below to verify.

To configure the plugin in Moodle:

  1. Log into the Moodle UI as a site administrator
  2. Navigate to the Libre Lambda Document converter settings: Site administration > Plugins > Document converters > Libre Lambda Document Converter
  3. Enter the values for: Key, Secret, Input bucket, Output bucket, and Region from the corresponding values returned by the provisioning script. E.g. Region: ap-southeast-2
  4. Click Save changes

Testing Document Conversion

There are two ways to test the document conversion. The first is by a command line test script that tests the AWS architecture independent of Moodle. The second uses the regular Moodle workflow to test the conversion process end to end. The following sections outline both.

Conversion test script

Once the AWS architecture has been setup using the provisioning script, it can be tested from the command line.

The following test command runs a basic conversion in AWS and returns the result status. To run the script:

  1. Change to your Moodle instance application directory. e.g. cd /var/www/moodle
  2. Run the following command, replacing <keyid> and <secretkey> With the AWS API Key ID and AWS API Secret Key that you obtained in the AWS Stack Setup.
    Replace <region> with the AWS region from the AWS stack set, e.g. ap-southeast-2.
    Replace <inputbucket> and <outputbucket> with the buckets from the setup.
    Finally enter the path to the file wish to convert to PDF.:
sudo -u www-data php files/converter/librelambda/cli/test.php \
--keyid=<keyid> \
--secret=<secretkey> \
--region=<region> \
--input-bucket=<inputbucket> \
--output-bucket=<outputbucket> \
--file='/var/www/moodle/files/converter/librelambda/tests/fixtures/testsubmission.odt'
--use-sdk-creds=0

To use credential set in AWS Credentials File, use use-sdk-creds=1. (https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/guide_credentials_profiles.html)

sudo -u www-data php files/converter/librelambda/cli/test.php \
--region=<region> \
--input-bucket=<inputbucket> \
--output-bucket=<outputbucket> \
--file='/var/www/moodle/files/converter/librelambda/tests/fixtures/testsubmission.odt'
--use-sdk-creds=1

Note: the user may be different to www-data on your system. Note: for unknown reasons running test first time after pushing to AWS stack may fail - just repeat.

Moodle assignment conversion

A full end to end test can be performed in Moodle. This section outlines this process.

Note: Cron must be configured in your Moodle instance for document conversion to operate. Information on setting up Cron on your Moodle instance can be found here: https://docs.moodle.org/36/en/Cron

To setup in Moodle:

  1. Log into the Moodle UI as a site administrator.
  2. Create a new Moodle course.
  3. Create a new Moodle user.
  4. Enrol the user as a student in the course created in step 2.
  5. In the Moodle course, create an assignment activity.
  6. In the assignment setup, enable File submissions is enabled as a submission type.
  7. In the assignment setup, enable Annotate PDF as a feedback type.
  8. Log into Moodle as the test student user.
  9. Submit an assignment as the test student.
  10. Wait for the system cron to run.
  11. Log back into Moodle as an administrator.
  12. Access the course and then the assignment.
  13. Click on grade in the assignment screen.
  14. The PDF of the submission should be displayed.

Additional Information

The following sections provide an overview of some additional topics for this plugin and it's associated AWS architecture.

Conversion Architecture

The below image shows the high level architecture the plugin provisioning process sets up in AWS.

Conversion Architecture

The conversion process and AWS architecture is relatively simple:

There are no traditional servers or compute resources involved in the conversion process. Storage for the uploaded and converted documents is provide by S3 (an AWS object storage service). The conversion processing is handled by Lambda (an AWS Function on Demand runtime service). This means compute resources are only used when they are invoked by a document upload and they are stopped when the document conversion is finished.

This architecture is also very scalable and can handle a high degree of parallelism. Every time a document is uploaded to the input bucket a new Lambda function is invoked (upto an initial limit of 1,000). The lambda functions are fully self contained and have everything they need to convert a document. This means newly uploaded documents don't need to wait for previous documents to be converted before their own conversion starts.

Privacy and Data Control

Student data privacy is very important, especially when sending data out of Moodle and supplying it to third party services. This plugin was designed with privacy and security in mind. Some of the privacy and security features are outlined below.

Cost Profiling

The following outlines the costs involved using this plugin to convert 100,000 documents to PDF. All costs are in AUD.

Costs for cloud based services have a lot of individual elements and can be confusing. Therefore it is often better to use a concrete example. Below is the cost breakdown for the conversion test undertaken of 100,000 source documents. The 100,000 documents require 38GB of storage space.

Documents to convert 100,000
Avg File Size (MB) 0.38
Operation Unit Unit cost Total Notes
Put to S3 Input bucket Per Request 0.0000055 $0.55
S3 Input Bucket Storage Per GB 0.025 $0.95
S3 Input to Lambda Per Request 0.00000044 $0.04
Lambda Invocations Per Request 0 $0.00 First million per month free
Lambda Executions GB/s 0 $0.00 400,00 GB-Seconds month free
Lambda to S3 Output Per Request 0.0000055 $0.55
S3 Output Bucket Storage Per GB 0.025 $0.95
Get from S3 Output Bucket Per Request 0.00000044 $0.04
First GB transfer out Per GB 0 $0.00
1GB - 9.999TB Transfer out Per GB 0.114 $4.22
Total $7.31
Per doc $0.0000731

Cost profiling resources:

Libre Office Archive and Compliation

This plugin includes precompiled LibreOffice archives as a compressed archive in the /libre folder of this repository. The archive is uploaded to AWS as part of the provisioning process. Lambda uses the uncompressed binaries to do the actual conversion of the uploaded documents to PDF.

The precompiled binary archive for LibreOffice is provided as a convienence to make setting everything up easier. However, you can obtain the LibreOffice source code and compile it yourself. See the section: Compiling Libre Office for instructions on how to do this.

FAQs

Why make this plugin?

Moodle currently ships with two (2) file converter plugins: Unoconv and Google Drive Converter. These plugins use external services to convert submitted files to PDF. In our experience using these plugins for production Moodle instances, both have issues. These issues are especially bad in sites that convert a lot of files. The issues are mainly related to performance but there are privacy concerns as well. In order to address these issues we decided to make this plugin.

The following is the broad criteria we used when making this plugin and this plugin aims to address all of these issues:

Does my Moodle need to be in AWS?

No, you’re Moodle instance doesn’t need to reside in AWS to use this plugin. As long as your Moodle can contact the AWS S3 endpoint via HTTPS you should be able to use this plugin. This includes development environments.

How long does document conversion take?

Typical conversion times are between 4 - 80 seconds. This is how long the conversion takes in AWS, the time it takes for the converted document to be available in Moodle depends on the timing of cron runs.

Conversion time is variable and depends on the source document. Also if you haven't done a conversion for a while there may be a "warm up" time for the AWS architecture.

Why AWS?

TODO: this

Inspiration

This plugin was inspired by and based on the initial work done by Vlad Holubiev to compile and run Libre Office within an AWS Lambda function.

License

2018 Matt Porritt mattp@catalyst-au.net

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.