Mark Edmondson 8/3/2017
R + Docker + containerit + Google Build triggers + GitHub + Plumber + Swagger (OpenAPI) + Flexible Custom runtime App Engine + Google Cloud Endpoints = a serverless, scalable R API, that can be called by non-R SDKs, built in OAuth2 and auth keys and monitoring
Below is taken from the guide here, which has more detail.
library(containerit)
to create a Dockerfile of the dependents your R code needsFROM gcr.io/gcer-public/plumber-appengine
to speed up build times as its preinstalled plumber. It also has googleCloudStorageR
so you can call data into your scripts, but in those cases you will need to also supply an auth.jsongcr.io/gcer-public/plumber-appengine
is a publicly built Docker image from the googleComputeEngineR
project, but you can make your own private one if you like and use Build Triggers to create it.
In any case, the Dockerfile is much simpler:
FROM gcr.io/gcer-public/plumber-appengine
LABEL maintainer="mark"
## uncomment as needed
# RUN export DEBIAN_FRONTEND=noninteractive; apt-get -y update \
# && apt-get install -y
## uncomment as needed
# RUN ["install2.r", "-r 'https://cloud.r-project.org'", ""]
# RUN ["installGithub.r", ""]
WORKDIR /payload/
COPY [".", "./"]
CMD ["api.R"]
app.yaml
Example:
runtime: custom
env: flex
env_variables:
GCS_AUTH_FILE: auth.json
If using googleCloudStorageR
in your script, you will need to include your own auth.json
service JSON key from your Google Cloud project in the folder you upload.
gcloud app deploy --project your-project
The App Engine is now up and running with your plumber powered R API. It will auto scale as more connections are added as you configure it in the app.yaml
- reference
We now enable Google Cloud Endpoints following this guide.
https://your-project.appspot.com/swagger.json
or if you run it locally via http://127.0.0.1:8080/swagger.json
library(yaml)
library(jsonlite)
make_openapi <- function(projectId){
json <- jsonlite::fromJSON(sprintf("https://%s.appspot.com/swagger.json", projectId))
json$host <- sprintf("%s.appspot.com", projectId)
## add operationId to each endpoint
ohgod <- lapply(names(json$paths),
function(x) {lapply(json$paths[[x]],
function(verb) {verb$operationId <- basename(x);verb})})
json$paths <- setNames(ohgod, names(json$paths))
# silly formatting
yaml <- gsub("application/json", "[application/json]", yaml::as.yaml(json))
yaml <- gsub("schemes: http", "schemes: [http]", yaml)
writeLines(yaml, con = "openapi.yaml")
}
make_openapi("your-project-id")
(issue raised with plumber to see if this can be handled by the library)
gcloud service-management deploy openapi.yaml --project your-project
gcloud service-management configs list --service=your-project.appspot.com
to see the service management name and config you just uploadedapp.yaml
of the app engine to include the info you got from the listing:endpoints_api_service:
# The following values are to be replaced by information from the output of
# 'gcloud service-management deploy openapi.yaml' command.
name: ENDPOINTS-SERVICE-NAME
config_id: ENDPOINTS-CONFIG-ID
Save, deploy the app again via gcloud app deploy --project your-project
Once deployed the /swagger.json
won't be available as its not in the API spec.
You should now see monitoring and logs
You can now play around with Cloud endpoints features by modifying the configuration files.
If you have peaks of traffic that spawn more instances then idle periods, its the total number of instance hours that count (e.g. a peak of one hour that launches 24 instances will cost the same as 24 hours with constant traffic that needs only one instance.) You determine how large these instances are and when they spawn (normally when they hit 50% of CPU) so it could be cheaper to have one large instance rather than two small ones.
The automatic scaling and resources of each instance will be the largest determination of cost. Use the monitoring to get the latency of the API and configure the app.yaml
accordingly to get the performance you require, which will determine when extra instances are launched running your R code underneath. For example, if you ran the default auto scaling with the default resources (2 instances with 1 CPU core, 0.6GB) and you have enough API traffic for 24 hours of constant API useage, it will cost $2.73 per day.
Make sure to put billing alerts and maximum spend on your app engine to avoid big charges, I typically put a $1 a day limit on App Engine when testing just to make sure nothing huge can go wrong with a misconfiguration of something.