Open brizandrew opened 5 years ago
@brizandrew Very interesting proposal.
First question of a few: Does this require labs to host interactives on the server's local disk in order to rebuild/deploy? If so, do you think we should keep those files forever or is there an archiving step?
Yes, and I was thinking for recent builds it should stay on disk, but there can be a task that empties the directories if they haven't been used in X amount of time for space purposes. And if they need to be used again, it will re pull.
Question 2: To use the nunjucks webpack plugin, would the expectation be that people would alter the webpack config to include new template context?
What kind of context beyond the hash were you thinking?
Having worked in staging/production auto-deploy systems before, I might recommend making master
the dev branch, and having explicit staging
and production
branches. It's just too easy to accidentally push to master.
Ya I could see that. Makes sense to me.
@brizandrew We often use nunjucks templates with arrays of data to create markup. Same thing you might use React for with a component. Think a card stack.
Ah okay. I think I see three ways through that:
We could put the template data in the webpack config file and load it in through html-webpack-plugin
to generate the file. I would probably not go with this in order to keep the template as clean as possible.
We could create an intermediary file which simply handles data loading and processing which is then imported inside the webpack config and used there. This would keep the file clean, but it would potentially introduce a promise structure into the config file if that data loaded to be loaded externally. That wouldn't be terrible, but it still doesn't seem perfect to me.
html-webpack-plugin
has a new feature called hooks which essentially allows us to hijack different parts of the creation process. We could, theoretically, use this avenue to load the separate context file above or maybe even an entire template. If this sounds better to folks, I can do more digging into the hooks.
See new reply
Question 3: You haven't yet mentioned meta data. Do you imagine all of that is written in labs? Are there checks to make sure it's filled out correctly before deploying?
No, I was thinking we keep handling the metadata the same as we do now. A JSON file that is loaded into the SSR HTML. If we wanted to externalize it, I think we should go to a google doc before we go to a database admin thing. Unless the database admin published a json file somewhere that the SSR script knows to pull from.
Question 4: Am I reading secrets management correctly that you don't imagine switching between profiles but making all profiles available in the env? How to you imagine switching environments works in this world, say between local and labs?
The execution of the secrets remains the same. We toggle between profiles by using arguments in the build script such as --config=production
. The only difference I'm proposing is how the secrets are kept and how they're used within the apps. Local and labs would both have their own credentials
and config
files in the same directory so when the build scripts are run locally, it calls the files on that my machine and when they're called on labs it calls the files on that machine. Just like AWS CLI would work if we were using it on the server.
Edit: I should reiterate/clarify that these are global files located in ~/.politico/
on both machines and not in the repo at all.
It has come to my attention that HTML Webpack Plugin and Nunjucks do not play nice. I would adjust my proposal to instead make use of add-asset-webpack-plugin which would call our context script to add our new file to the webpack graph. That context file, again, would generate html via a nunjucks loader like this rather than the webpack one.
Using this more agnostic method means that adding an option for React SSR would be made much easier.
Very good.
OK, meta question: Tell me what advantage you see to rolling our own deployment platform on labs as opposed to integrating with a CI service like travis, circleci, drone, whatever?
I'm not opposed to using a CI to build and deploy. Travis already has S3 as a built-in deployment routine. I'll admit I'm only familiar with using existing build routines. Could be worth spending more time looking into it further. Cost is also a factor unless we want to make all our interactives public in order to deploy them.
We would need the CI client to have an API (like this from Travis) which allows us to call a rebuild without pushing code in order to allow users to trigger an update via a labs admin.
RE: Travis CI Cost.
So in the case of using a CI the flow would look something like this. It does clear up roles a lot better I think.
After some more talk with Jon, I think I agree that as per principle two, the creation step for a new interactive should be made more frictionless. For that reason, here is an updated version which centers around the local machine and only uses labs for republishing and record keeping:
The interactives generator has served us well for a solid amount of time. Beatrice and I significantly less so (I think we’ve only used it once) but as a team, I hear it’s been great.
But all good software eventually grow old and need to be upgraded. This is my plan to do just that.
Design Principles
This plan was created with a few set of key principles in mind.
1. Eliminating Redundancy
We don’t need Gulp. Promises and async have made npm’s native task management viable enough to use. Even our Webpack is redundant. Redundant config makes debugging more difficult. If a build is failing, it should be as clear as possible where the problem is.
2. Simplifying On-Boarding
As we hire new developers, we need to make the setup process easier. Ideally it would be one install, unfortunately for security we also need to make sure that gets set up as well. But those should be the only two steps.
3. Reducing Friction Around Development
From starting a new project to managing static files to publishing. These steps should not be thought about by interactives devs. It needs to be flexible enough to handle agile development but strict enough to discourage bad code.
4. Centralizing Content
Devs get new jobs. Devs get sick. Devs go MIA sometimes. Editors (both on the interactives team and across the newsroom) need a way to edit it.
5. Centralizing Configuration
Along that same vein, we have too many config files. Each repo should have ONE location for configuration with as little redundancy as possible (see principle one).
With these five principles in mind, let’s see how to achieve some of them:
De-Gulpify Processes
First things first, we don’t need Gulp. We’ve already removed it from our newer builds, and there’s only a couple loose ends that need to be de-gulpified. Here’s my plan to do that task by task.
Archie
We should be using the Archie Webpack loader to load in files once they’re downloaded. This will allow us to
import
them just like we do markdown and text from a./src/content
directory. I would add to this that we should have a pure JSON downloader, for non-Archie data-files. Because it’s easy and I have ideas for how to use this in the future.AWS
This was pretty much already degulpified in DIY Congress here.
Webpack Build
Also pretty much done in DIY Congress here. We should use the more complex config system used on Election pages to avoid all the duplicate code we normally have in these and create configurable environments on a project-by-project basis.
Data
Just use cp instead of gulp.
Webpack Dev Server
Also already done here.
Local Dist Preview Server
This script isn’t actually tied to gulp. We can just port it over. Don’t even know how much it’s needed now that Chrome has it’s own preview server capabilities built-in.
Build HTML (With Hashes)
Rather than creating HTML templates via template strings like we did in DIY Congress or have a separate nunjucks template from our preview index.html like we did in election pages, I propose we use the nunjucks-webpack-plugin to create a single index file in templates.
Hashes are optional, but given our success with them so far to avoid caching errors and help in debugging, I suggest we keep them. We can keep the hash generator we used in election pages and import it in our Webpack config like this. This hash can be used to create the build files like we do here and here and then pass that hash into the context for the nunjucks template in the loader in that same config file.
Image Processing
The popular node image processing library is jimp. With a bit of extra code around it to get a list of all files in the
img
directory and loop through them, we can easily replacegulp-responsive
Spreadsheet
The current system is deeply tied to using GDrive. We can probably replace most of it using the same code used to make
api-to-sheets
. Specifically the GAPI class. This class also handles pulling down the data so we would just have to create a script that runs this.Watch Scripts (code, data, images)
Watch scripts for code will be handled by the webpack dev server. Watch scripts to run the image-processing script and cp the data can be handled through a parallel nodemon script run on
yarn start
as suggested here.Replacing GDrive
Replacing GDrive is probably going to be one of the bigger challenges in modernizing our rig.
For spreadsheets it should be easy enough to port over GAPI from api-to-sheets and add a method for the rows API method.
For Archie we’re going to have to configure an app using the Drive API, probably using the official node library. The Archie folks also already have an example where they do something like that here but in an express server which we won’t want.
Both these methods will require a service account. We currently have two: the one running api-to-sheets and the one running Kitchen Sink. We should probably decide on one to use and also have a single Drive folder which we all belong to and which the service account has been invited to as well.
Secrets Management
Right now we use passphrase to handle credentials. These credentials get saved in
~/.politico/interactives.json
in an encrypted form and are decrypted in the environment using the passphrase provided.I propose migrating to using a TOML based profile systems similar to AWS . Dotenv exists, but it can’t handle multiple profiles, and most of that libraries code is the parsing and therefore useless. There does exist a dotenv-toml library, but it doesn’t seemed too maintained.
Solution? Build a quick custom library which parses a TOML file located in
~/.politico/credentials
and~/.politico/config
. If we want to open source this library we can make it configurable but given it’s lack of complexity, I would argue that its implementation should be as simple as possible and therefore paths should be defaulted to a standard we’ve agreed on.In order to avoid duplicative code, we should have a
default
profile that is loaded in and then overidden by the specific profile called.There might be cases in which you need values from more than one profile. I propose we create an import that loads every profile and adds prefix to each one.
The question that needs to be decided is if these variables overwrite the current system variables. If you already have
AWS_ACCESS_KEY_ID
exported to your system do these new variables respect that or override it for the node process.Examples
Default Imports Only
Specific Profiles
Prefixed Profiles
POLITICO Bin
Between the environment parser and other build scripts listed above, I also propose that we should move all the processes involved in building our interactives into their own library which we can all maintain, upgrade, and keep out of our interactives themselves. Not only will this remove tons of duplicate code currently housed in gulp directories across our interactives, but it will make fixing older interactives much easier.
We all know, I shouldn’t be the one naming things but for now I’ll call this npm library,
politico-bin
. Politico Bin would be a suite of modules which perform different tasks related to building and distributing our interactives. One example of these modules has already been showcased above withpolitico-env
. With this structure you would importpolitico-bin/env
to configure your environment variables. Other modules would include:./bin
directory.The Interactives Admin
All these upgrades lead us to the biggest of them all: and all-in-one interactives admin. From here users can make new interactives, check status and logs, and trigger rebuilds to both staging and production. Sounds great right? But how does it work? Well that’s what this flowchart is for:
With me so far? Think that flowchart looks a little too good to be true? Well here’s some more specifics.
Interactives are started from the Django admin by any user with permissions. They fill out a form with some config options like if they’re using sheets and what the sheet ID is. The usual stuff. This adds a new entry in our
projects
model.The post-save signal on that model triggers a project creation task in celery. That task is responsible for creating a new directory and running our yo generator in that directory with the options supplied in the model. Then it creates a new repo and does the initial push to the
master
branch. It will then create a new branch calledstaging
with the same code.Users should see the new private repo on Github and they can clone it as they’re used to. They should absolutely make a dev branch so they can safely save their code without triggering build processes. That repo already includes
politico-bn
as a dev dependency so they can use the scripts inside to help them develop. The generator should already have package.json and webpack configs set up so it should be as simple asyarn start
Just like now, they can publish to both staging and production based on their AWS credentials, but manually pushing code to production is for amateurs.When they’re ready to publish to staging, they can merge their code into the
staging
branch (either locally or on Github) which will trigger a webhook to Labs telling it to update the staging bucket.Once labs gets this web hook, it will
git pull origin staging
. Then it will rebuild the distribution usingpolitico-bin
. And finally, it will publish this new built code to the staging bucket.You’ll be able to preview it in the staging bucket, and once you’re satisfied you can merge that code into the
master
branch which will cause the same process but for the production server.But what if you’re an editor who wants to make some content changes in a Google Sheet without having to clone the whole repo? Well that’s the other benefit of having a control panel. You can go into the admin, and hit a single republish button which will rerun the same build and publish task in the server.
Any questions, comments, concerns? Reply in this thread too keep the discussion going.