Creating a serverless R API on Google Cloud Platform

Mark Edmondson 8/3/2017

R + Docker + containerit + Google Build triggers + GitHub + Plumber + Swagger (OpenAPI) + Flexible Custom runtime App Engine + Google Cloud Endpoints = a serverless, scalable R API, that can be called by non-R SDKs, built in OAuth2 and auth keys and monitoring

Deploy to App Engine

Install plumber, containerit, and the python gcloud SDK

Below is taken from the guide here, which has more detail.

Create a plumber R script with endpoints for your R code
Use library(containerit) to create a Dockerfile of the dependents your R code needs
Alter the generated Dockerfile so it works on App Engine as detailed there
You can use FROM gcr.io/gcer-public/plumber-appengine to speed up build times as its preinstalled plumber. It also has googleCloudStorageR so you can call data into your scripts, but in those cases you will need to also supply an auth.json

gcr.io/gcer-public/plumber-appengine is a publicly built Docker image from the googleComputeEngineR project, but you can make your own private one if you like and use Build Triggers to create it.

In any case, the Dockerfile is much simpler:

FROM gcr.io/gcer-public/plumber-appengine
LABEL maintainer="mark"

## uncomment as needed
# RUN export DEBIAN_FRONTEND=noninteractive; apt-get -y update \
# && apt-get install -y 

## uncomment as needed
# RUN ["install2.r", "-r 'https://cloud.r-project.org'", ""]
# RUN ["installGithub.r", ""]

WORKDIR /payload/
COPY [".", "./"]

CMD ["api.R"]

Configure App Engine's app.yaml

Example:

runtime: custom
env: flex

env_variables:
  GCS_AUTH_FILE: auth.json

If using googleCloudStorageR in your script, you will need to include your own auth.json service JSON key from your Google Cloud project in the folder you upload.

Upload to a USA based App Engine via gcloud app deploy --project your-project
Your R code is now deployed as a Plumber app https://your-project.appspot.com/

The App Engine is now up and running with your plumber powered R API. It will auto scale as more connections are added as you configure it in the app.yaml - reference

Deploying Cloud Endpoints

We now enable Google Cloud Endpoints following this guide.

Plumber generates a swagger.json file that is available at https://your-project.appspot.com/swagger.json or if you run it locally via http://127.0.0.1:8080/swagger.json
Run the function below to generate the openapi.yaml for Google cloud endpoints

library(yaml)
library(jsonlite)

make_openapi <- function(projectId){
  json <- jsonlite::fromJSON(sprintf("https://%s.appspot.com/swagger.json", projectId))
  json$host <- sprintf("%s.appspot.com", projectId)

  ## add operationId to each endpoint
  ohgod <- lapply(names(json$paths), 
                  function(x) {lapply(json$paths[[x]], 
                                      function(verb) {verb$operationId <- basename(x);verb})})
  json$paths <- setNames(ohgod, names(json$paths))

  # silly formatting
  yaml <- gsub("application/json", "[application/json]", yaml::as.yaml(json))
  yaml <- gsub("schemes: http", "schemes: [http]", yaml)

  writeLines(yaml, con = "openapi.yaml")
}

make_openapi("your-project-id")

(issue raised with plumber to see if this can be handled by the library)

Deploy the openapi.yaml in terminal via gcloud service-management deploy openapi.yaml --project your-project
Run gcloud service-management configs list --service=your-project.appspot.com to see the service management name and config you just uploaded
Add these lines to the app.yaml of the app engine to include the info you got from the listing:

endpoints_api_service:
  # The following values are to be replaced by information from the output of
  # 'gcloud service-management deploy openapi.yaml' command.
  name: ENDPOINTS-SERVICE-NAME
  config_id: ENDPOINTS-CONFIG-ID

Save, deploy the app again via gcloud app deploy --project your-project

Once deployed the /swagger.json won't be available as its not in the API spec.

Check it

You should now see monitoring and logs

Going further

You can now play around with Cloud endpoints features by modifying the configuration files.

Alter automatic scaling to determine when and how to launch new instances to cover traffic.
Restrict access via roles, API keys, OAuth2 etc.
Publicise the API so other users can use it in their own projects
Generate Client library bundles in Java or python
Integrate R endpoints with other endpoints written in different langauges

Pricing

Cloud Endpoints: 0-2Million calls - Free, 2M+ - $3 per million API calls
App Engine costs - Flexible App engine pricing - depends largely on how many instances you spawn per X connections.

If you have peaks of traffic that spawn more instances then idle periods, its the total number of instance hours that count (e.g. a peak of one hour that launches 24 instances will cost the same as 24 hours with constant traffic that needs only one instance.) You determine how large these instances are and when they spawn (normally when they hit 50% of CPU) so it could be cheaper to have one large instance rather than two small ones.

The automatic scaling and resources of each instance will be the largest determination of cost. Use the monitoring to get the latency of the API and configure the app.yaml accordingly to get the performance you require, which will determine when extra instances are launched running your R code underneath. For example, if you ran the default auto scaling with the default resources (2 instances with 1 CPU core, 0.6GB) and you have enough API traffic for 24 hours of constant API useage, it will cost $2.73 per day.

Make sure to put billing alerts and maximum spend on your app engine to avoid big charges, I typically put a $1 a day limit on App Engine when testing just to make sure nothing huge can go wrong with a misconfiguration of something.

MarkEdmondson1234 / serverless-R-API-appengine

readme