Code-for-All / lockdown

Project Lockdown (an initiative from The IO Foundation) is a civic tech, interactive platform providing an overview of the state of Human and Digital Rights around the globe. It evaluates policies obtained from high quality sources that may impact their observance. It provides, among other tools, a layered map interface that allows for a visual representation of the policies adopted, assisting a broad range of stakeholders in understanding the global state of their Rights. This empowers them to become active agents of global change.
https://ProjectLockdown.world
Apache License 2.0
28 stars 30 forks source link

Reduce number of CosmosDB write #371

Open daphnecys opened 3 years ago

daphnecys commented 3 years ago

Issue The current GitHub action will call the Update Data every 10 minutes. The code in lockdown/backend/src/loaders/lockdown/lockdown.js batchGetTerritoriesEntryData() is called in "Update Data" action.

This code parses the Data Set Entry GSheet, recreate the JSONs and reinsert them into CosmosDB (MongoDB), clearing the DB first. This generates a huge amount of unnecessary inserts. We want to do the clear DB and entire parse only once each half hour, and every 10 minutes, just the updates.

The entire parse is necessary to ensure that the bug fix by this code change https://github.com/Code-for-All/lockdown/commit/2590e307403816982d8aa14326a67134b5d97293 remains fixed, i.e. delete country entries where entries are blank.

Possible solution Two methods are possible:

1) Create 2 GitHub action, each half hour run the function with parameter to clear data in DB and regenerate JSONs. every 10 mins in the half hour, just the updates (function with no parameter).

2) Keep an environment variable storing is the timestamp of the last clear all run. If more than half hour, clear data in DB and regenerate JSONs, otherwise just update without clear.

(Daphne's ref: June 25 convo with Mark)