The-Politico / generator-politico-interactives

A Yeoman generator to scaffold a development environment for building POLITICO interactives.
7 stars 8 forks source link

Interactives 2.0 #120

Open brizandrew opened 5 years ago

brizandrew commented 5 years ago

The interactives generator has served us well for a solid amount of time. Beatrice and I significantly less so (I think we’ve only used it once) but as a team, I hear it’s been great.

But all good software eventually grow old and need to be upgraded. This is my plan to do just that.

Design Principles

This plan was created with a few set of key principles in mind.

1. Eliminating Redundancy

We don’t need Gulp. Promises and async have made npm’s native task management viable enough to use. Even our Webpack is redundant. Redundant config makes debugging more difficult. If a build is failing, it should be as clear as possible where the problem is.

2. Simplifying On-Boarding

As we hire new developers, we need to make the setup process easier. Ideally it would be one install, unfortunately for security we also need to make sure that gets set up as well. But those should be the only two steps.

3. Reducing Friction Around Development

From starting a new project to managing static files to publishing. These steps should not be thought about by interactives devs. It needs to be flexible enough to handle agile development but strict enough to discourage bad code.

4. Centralizing Content

Devs get new jobs. Devs get sick. Devs go MIA sometimes. Editors (both on the interactives team and across the newsroom) need a way to edit it.

5. Centralizing Configuration

Along that same vein, we have too many config files. Each repo should have ONE location for configuration with as little redundancy as possible (see principle one).

With these five principles in mind, let’s see how to achieve some of them:

De-Gulpify Processes

First things first, we don’t need Gulp. We’ve already removed it from our newer builds, and there’s only a couple loose ends that need to be de-gulpified. Here’s my plan to do that task by task.

Archie

We should be using the Archie Webpack loader to load in files once they’re downloaded. This will allow us to import them just like we do markdown and text from a ./src/content directory. I would add to this that we should have a pure JSON downloader, for non-Archie data-files. Because it’s easy and I have ideas for how to use this in the future.

AWS

This was pretty much already degulpified in DIY Congress here.

Webpack Build

Also pretty much done in DIY Congress here. We should use the more complex config system used on Election pages to avoid all the duplicate code we normally have in these and create configurable environments on a project-by-project basis.

Data

Just use cp instead of gulp.

Webpack Dev Server

Also already done here.

Local Dist Preview Server

This script isn’t actually tied to gulp. We can just port it over. Don’t even know how much it’s needed now that Chrome has it’s own preview server capabilities built-in.

Build HTML (With Hashes)

Rather than creating HTML templates via template strings like we did in DIY Congress or have a separate nunjucks template from our preview index.html like we did in election pages, I propose we use the nunjucks-webpack-plugin to create a single index file in templates.

Hashes are optional, but given our success with them so far to avoid caching errors and help in debugging, I suggest we keep them. We can keep the hash generator we used in election pages and import it in our Webpack config like this. This hash can be used to create the build files like we do here and here and then pass that hash into the context for the nunjucks template in the loader in that same config file.

Image Processing

The popular node image processing library is jimp. With a bit of extra code around it to get a list of all files in the img directory and loop through them, we can easily replace gulp-responsive

Spreadsheet

The current system is deeply tied to using GDrive. We can probably replace most of it using the same code used to make api-to-sheets. Specifically the GAPI class. This class also handles pulling down the data so we would just have to create a script that runs this.

Watch Scripts (code, data, images)

Watch scripts for code will be handled by the webpack dev server. Watch scripts to run the image-processing script and cp the data can be handled through a parallel nodemon script run on yarn start as suggested here.

Replacing GDrive

Replacing GDrive is probably going to be one of the bigger challenges in modernizing our rig.

For spreadsheets it should be easy enough to port over GAPI from api-to-sheets and add a method for the rows API method.

For Archie we’re going to have to configure an app using the Drive API, probably using the official node library. The Archie folks also already have an example where they do something like that here but in an express server which we won’t want.

Both these methods will require a service account. We currently have two: the one running api-to-sheets and the one running Kitchen Sink. We should probably decide on one to use and also have a single Drive folder which we all belong to and which the service account has been invited to as well.

Secrets Management

Right now we use passphrase to handle credentials. These credentials get saved in ~/.politico/interactives.json in an encrypted form and are decrypted in the environment using the passphrase provided.

I propose migrating to using a TOML based profile systems similar to AWS . Dotenv exists, but it can’t handle multiple profiles, and most of that libraries code is the parsing and therefore useless. There does exist a dotenv-toml library, but it doesn’t seemed too maintained.

Solution? Build a quick custom library which parses a TOML file located in ~/.politico/credentials and ~/.politico/config. If we want to open source this library we can make it configurable but given it’s lack of complexity, I would argue that its implementation should be as simple as possible and therefore paths should be defaulted to a standard we’ve agreed on.

In order to avoid duplicative code, we should have a default profile that is loaded in and then overidden by the specific profile called.

There might be cases in which you need values from more than one profile. I propose we create an import that loads every profile and adds prefix to each one.

The question that needs to be decided is if these variables overwrite the current system variables. If you already have AWS_ACCESS_KEY_ID exported to your system do these new variables respect that or override it for the node process.

Examples

# ~/.politico/credentials

[default]
AWS_ACCESS_KEY_ID=DEEFUTLAKIAIOSFODNN7
AWS_SECRET_ACCESS_KEY=DEFAULTwJalrXUEXAMPLEKEY
GOOGLE_API_KEY=bjekvneyqiwn

[apps]
AWS_ACCESS_KEY_ID=APPSDJLLPMMOBXEXAMPLE
AWS_SECRET_ACCESS_KEY=APPSNHrylDEDNBXAMPLEKEY
# ~/.politico/config

[default]
AWS_REGION=us-east-1
AWS_BUCKET_NAME=staging.interactives.com

[staging]
AWS_BUCKET_NAME=staging.interactives.politico.com

[production]
AWS_BUCKET_NAME=interactives.politico.com
// ./script.js
import 'politico-env';
console.log('ID:', process.env.AWS_ACCESS_KEY_ID);
console.log('SECRET:', process.env.AWS_SECRET_ACCESS_KEY);
console.log('GOOGLE:', process.env.GOOGLE_API_KEY);
console.log('BUCKET:', process.env.AWS_BUCKET_NAME);
console.log('PROD BUCKET:', process.env.PRODUCTION_AWS_BUCKET_NAME);

Default Imports Only

$ node script.js
# ID: DEEFUTLAKIAIOSFODNN7
# SECRET: DEFAULTwJalrXUEXAMPLEKEY
# GOOGLE: bjekvneyqiwn
# BUCKET: staging.interactives.com
# PROD BUCKET: undefined

Specific Profiles

$ node script.js --config=production --credentials=apps
# ID: APPSDJLLPMMOBXEXAMPLE
# SECRET: APPSNHrylDEDNBXAMPLEKEY
# GOOGLE: bjekvneyqiwn
# BUCKET: interactives.politico.com
# PROD BUCKET: undefined

Prefixed Profiles

$ node script.js --prefix-env
# ID: DEEFUTLAKIAIOSFODNN7
# SECRET: DEFAULTwJalrXUEXAMPLEKEY
# GOOGLE: bjekvneyqiwn
# BUCKET: staging.interactives.com
# PROD BUCKET: interactives.politico.com

POLITICO Bin

Between the environment parser and other build scripts listed above, I also propose that we should move all the processes involved in building our interactives into their own library which we can all maintain, upgrade, and keep out of our interactives themselves. Not only will this remove tons of duplicate code currently housed in gulp directories across our interactives, but it will make fixing older interactives much easier.

We all know, I shouldn’t be the one naming things but for now I’ll call this npm library, politico-bin. Politico Bin would be a suite of modules which perform different tasks related to building and distributing our interactives. One example of these modules has already been showcased above with politico-env. With this structure you would import politico-bin/env to configure your environment variables. Other modules would include:

The Interactives Admin

All these upgrades lead us to the biggest of them all: and all-in-one interactives admin. From here users can make new interactives, check status and logs, and trigger rebuilds to both staging and production. Sounds great right? But how does it work? Well that’s what this flowchart is for:

interactives generator 1

With me so far? Think that flowchart looks a little too good to be true? Well here’s some more specifics.

Interactives are started from the Django admin by any user with permissions. They fill out a form with some config options like if they’re using sheets and what the sheet ID is. The usual stuff. This adds a new entry in our projects model.

The post-save signal on that model triggers a project creation task in celery. That task is responsible for creating a new directory and running our yo generator in that directory with the options supplied in the model. Then it creates a new repo and does the initial push to the master branch. It will then create a new branch called staging with the same code.

Users should see the new private repo on Github and they can clone it as they’re used to. They should absolutely make a dev branch so they can safely save their code without triggering build processes. That repo already includes politico-bn as a dev dependency so they can use the scripts inside to help them develop. The generator should already have package.json and webpack configs set up so it should be as simple as yarn start Just like now, they can publish to both staging and production based on their AWS credentials, but manually pushing code to production is for amateurs.

When they’re ready to publish to staging, they can merge their code into the staging branch (either locally or on Github) which will trigger a webhook to Labs telling it to update the staging bucket.

Once labs gets this web hook, it will git pull origin staging. Then it will rebuild the distribution using politico-bin. And finally, it will publish this new built code to the staging bucket.

You’ll be able to preview it in the staging bucket, and once you’re satisfied you can merge that code into the master branch which will cause the same process but for the production server.

But what if you’re an editor who wants to make some content changes in a Google Sheet without having to clone the whole repo? Well that’s the other benefit of having a control panel. You can go into the admin, and hit a single republish button which will rerun the same build and publish task in the server.

Any questions, comments, concerns? Reply in this thread too keep the discussion going.

hobbes7878 commented 5 years ago

@brizandrew Very interesting proposal.

First question of a few: Does this require labs to host interactives on the server's local disk in order to rebuild/deploy? If so, do you think we should keep those files forever or is there an archiving step?

brizandrew commented 5 years ago

Yes, and I was thinking for recent builds it should stay on disk, but there can be a task that empties the directories if they haven't been used in X amount of time for space purposes. And if they need to be used again, it will re pull.

hobbes7878 commented 5 years ago

Question 2: To use the nunjucks webpack plugin, would the expectation be that people would alter the webpack config to include new template context?

brizandrew commented 5 years ago

What kind of context beyond the hash were you thinking?

TylerFisher commented 5 years ago

Having worked in staging/production auto-deploy systems before, I might recommend making master the dev branch, and having explicit staging and production branches. It's just too easy to accidentally push to master.

brizandrew commented 5 years ago

Ya I could see that. Makes sense to me.

hobbes7878 commented 5 years ago

@brizandrew We often use nunjucks templates with arrays of data to create markup. Same thing you might use React for with a component. Think a card stack.

brizandrew commented 5 years ago

Ah okay. I think I see three ways through that:

In The Config

We could put the template data in the webpack config file and load it in through html-webpack-plugin to generate the file. I would probably not go with this in order to keep the template as clean as possible.

In A Context File

We could create an intermediary file which simply handles data loading and processing which is then imported inside the webpack config and used there. This would keep the file clean, but it would potentially introduce a promise structure into the config file if that data loaded to be loaded externally. That wouldn't be terrible, but it still doesn't seem perfect to me.

Hijack the HTML Extract Plugin Using Hooks

html-webpack-plugin has a new feature called hooks which essentially allows us to hijack different parts of the creation process. We could, theoretically, use this avenue to load the separate context file above or maybe even an entire template. If this sounds better to folks, I can do more digging into the hooks.

See new reply

hobbes7878 commented 5 years ago

Question 3: You haven't yet mentioned meta data. Do you imagine all of that is written in labs? Are there checks to make sure it's filled out correctly before deploying?

brizandrew commented 5 years ago

No, I was thinking we keep handling the metadata the same as we do now. A JSON file that is loaded into the SSR HTML. If we wanted to externalize it, I think we should go to a google doc before we go to a database admin thing. Unless the database admin published a json file somewhere that the SSR script knows to pull from.

hobbes7878 commented 5 years ago

Question 4: Am I reading secrets management correctly that you don't imagine switching between profiles but making all profiles available in the env? How to you imagine switching environments works in this world, say between local and labs?

brizandrew commented 5 years ago

The execution of the secrets remains the same. We toggle between profiles by using arguments in the build script such as --config=production. The only difference I'm proposing is how the secrets are kept and how they're used within the apps. Local and labs would both have their own credentials and config files in the same directory so when the build scripts are run locally, it calls the files on that my machine and when they're called on labs it calls the files on that machine. Just like AWS CLI would work if we were using it on the server.

Edit: I should reiterate/clarify that these are global files located in ~/.politico/ on both machines and not in the repo at all.

brizandrew commented 5 years ago

Update to Question 3

It has come to my attention that HTML Webpack Plugin and Nunjucks do not play nice. I would adjust my proposal to instead make use of add-asset-webpack-plugin which would call our context script to add our new file to the webpack graph. That context file, again, would generate html via a nunjucks loader like this rather than the webpack one.

Using this more agnostic method means that adding an option for React SSR would be made much easier.

hobbes7878 commented 5 years ago

Very good.

OK, meta question: Tell me what advantage you see to rolling our own deployment platform on labs as opposed to integrating with a CI service like travis, circleci, drone, whatever?

brizandrew commented 5 years ago

I'm not opposed to using a CI to build and deploy. Travis already has S3 as a built-in deployment routine. I'll admit I'm only familiar with using existing build routines. Could be worth spending more time looking into it further. Cost is also a factor unless we want to make all our interactives public in order to deploy them.

We would need the CI client to have an API (like this from Travis) which allows us to call a rebuild without pushing code in order to allow users to trigger an update via a labs admin.

RE: Travis CI Cost.

RE: Nodejs And TravisCI

brizandrew commented 5 years ago

So in the case of using a CI the flow would look something like this. It does clear up roles a lot better I think.

interactives generator 2

brizandrew commented 5 years ago

After some more talk with Jon, I think I agree that as per principle two, the creation step for a new interactive should be made more frictionless. For that reason, here is an updated version which centers around the local machine and only uses labs for republishing and record keeping:

interactives generator 3 1

hobbes7878 commented 5 years ago

https://drone.io/