SSWConsulting / SSW.Rules

Generator for ssw.com.au/rules
https://www.ssw.com.au/rules
MIT License
13 stars 13 forks source link

CI/CD - History.json update process should be moved into the build pipeline #459

Open piers-sinclair opened 3 years ago

piers-sinclair commented 3 years ago

@bradystroud @william-liebenberg @wicksipedia

Sometimes users are not set as the last modified by for a rule even though they just modified it. This issue occurs because users aren't forced to have GitHub Actions enabled on their forks. Additionally, created date is not involved in this process, but it should be.

To solve these issues

  1. Remove the GitHub Action and add a build step in Azure DevOps on the main repo to perform the process
  2. Add created date to this process
  3. Remove created date from Netlify CMS

AB#61012

wicksipedia commented 3 years ago

Hey all, @william-liebenberg and I are on the same page for this - there shouldn't be a history.json file. All the data should be in git already - just need to query it out. Preferably using GraphQL or as an extra step in the Gatsby build.

This would alleviate the need for the action step entirely.

christianmorfordwaitessw commented 3 years ago

@bradystroud @wicksipedia @william-liebenberg

Hi All,

As per my conversation with @pierssinclairssw, we tried several ways to query the data and eventually ended up using the History.json solution.

- Option 1 (Read the modified file details from the GitHub repo files) At first, we intended to use a Gatsby plugin to get the file information such as https://www.gatsbyjs.com/plugins/gatsby-source-filesystem. This did not work because we have multiple GitHub Repositories for SSW.Rules (SSW.Rules + SSW.Rules.Content) and the plugin was unable to read the SSW.Rules.Conent repo.

- Option 2 (Use the GitHub API to get the modified details) Using the GitHub API was the next plan as it contained all the data we needed. Unfortunately, we hit API rate limits (5000 points/hour) and it was very slow as we needed to recalculate them for every rule. https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit

- Option 3 (Use a History.json file to store the modified details) This was the last option we tried and was similar to this solution https://github.com/joshuatz/git-date-extractor

I think we still need some form of caching so we only need to get the date timestamps/authors on updated rules, rather than refreshing the data for all the rules.

-Christian

bradystroud commented 3 years ago

As per my conversation with @pierssinclairssw @taineriley1, here are the options we have come up with. We don't think any of these are great.

Option 1 - On load - Retrieve data using our own API

Option 2 - On load - Retrieve data with GitHub API (same as above but no cache)

Option 3 - On build- PS script on build

Option 4 - On build- Store history in Azure Blob/Table/CDN Storage

Option 5 - On build- Have a metadata file for each rule

bradystroud commented 3 years ago

cc: @adamcogan @christianmorfordwaitessw @william-liebenberg

As per my conversation with @wicksipedia, @pierssinclairssw and @taineriley1, we have decided not to use any of the above solutions.

This is why the above options are not ideal: Option 1 and 2 - On load - Cache management is can be difficult and hard to maintain. Storage options could be expensive if user load is high e.g. CosmosDB Option 3 - On build - PowerShell script it too slow (~30min) Option 4 - On build - Could have race conditions. Will add more complexity Option 5 - On build - Will not solve the GitHub action problem

Instead we are going to try these alternatives.

Alternative 1 - Use local option in gatsby-source-git from local folder on build server

  1. In the build, have a yarn script that clones the full content repository (this will include the git history)
  2. Point gatsby-source-git to the local clone
  3. Use https://www.gatsbyjs.com/plugins/gatsby-plugin-changelog-context/?=source-gi#adding-filesystem-data-to-your-page-query to get history data

Alternative 2 - Increase efficiency of PowerShell script

  1. Follow Option 3 from above, but look into ways to make it more efficient e.g. Optimising git commands https://stackoverflow.com/questions/21735435/git-clone-changes-file-modification-time

Alternative 3 - Ask for help

  1. Call Matt W again
bradystroud commented 3 years ago

As per my conversation with @pierssinclairssw, we have found that alternative 1 doesn't work because you can't run git commands across repos.

Here is an updated version of Alternative 1.

Alternative 1 - Use local option in gatsby-source-git from within the rules repo on build server (SSW.Rules/.git must be deleted)

  1. In the build, have a yarn script that clones the full content repository (this will include the git history)
  2. If it is building on the build server, remove the SSW Rules .git folder
  3. Use gatsby-source-git on a local folder within the repo on the build server.

We have not investigated Alternative 2 much, but we would prefer alternative 1 because even if we can increase the efficiency of alternative 2, it may not scale well and it also requires manual folder manipulation in the repo on the build server.

wicksipedia commented 3 years ago

you will need to add multiple checkout steps to the pipeline to get the other repo

https://docs.microsoft.com/en-us/azure/devops/pipelines/repos/multi-repo-checkout?view=azure-devops

bradystroud commented 3 years ago

Fixing the SSW Rules history data problem would have better ROI than fixing all the problems it creates. I used to spend ~16 hours a month fixing merge conflicts, helping rules editors with failing PRs, fixing empty history data on rules and more. These are all problems caused by the existing history data solution.

I estimate it would take ~32hrs to fix the history data

piers-sinclair commented 3 years ago

CC: @bradystroud @wicksipedia @jakebayliss @adamcogan @christianmorfordwaitessw @JackDevAU

As per my conversation with Adam, I have been getting lots of calls every week from users thinking they have done something wrong because the action fails.

We should treat this problem as the number 1 priority to be worked on next for Rules.

wicksipedia commented 3 years ago

Hey! 👋 Just an FYI - I had this problem with a PR https://github.com/SSWConsulting/SSW.Rules.Content/pull/1697

Hona commented 2 years ago

CC: @bradystroud @wicksipedia @jakebayliss @adamcogan @christianmorfordwaitessw @JackDevAU @pierssinclairssw @taineriley1 @JackLeerson

(Checked by @bradystroud) Hi All,

As we all agree - keeping history.json in the git history is a terrible solution:

  1. Will continue to grow, both in size and complexity (GitHub has a limit to file size, as well as the problem of resolving merge conflicts...)
  2. Causes merge conflicts - meaning netlify changes will require dev fixes
  3. File does not update when forks are used due to GitHub permissions (the GitHub Action won't run)

📃 Quick Summary

Remove the history.json from git history, and generate it on demand from a cache via the deploy pipeline

🔨 Proposed Solution:

  1. Create an Azure CosmoDB with:
Sync History Commit Hash
7929f45...

(This will only be one row, as a pointer to where we have synced to)

Rule History Cache Mardown File Path Changed By Display Name Changed At Date Time GitHub Username
address-formatting/rule.md Adam Cogan 2009-02-15T00:00:00Z adamcogan
... ... ... ...
  1. Create the following Azure Functions:

    • Sync (this will be the first part of the deploy pipeline

      1. Fetch the latest Sync History commit hash
      2. Cache from the latest hash -> HEAD on master

        1. Run git log from the hash -> HEAD, and find all files changed
        2. Update the cache using output from this step
    • Fetch All

      1. Fetch all data from the Rule History Cache
  2. Add a build step to call the azure function, and save as the history.json file - this mean no changes are required to for the Gatsby build step.

📝 Architecture Diagram

SSW-Rule-V2-architecture-diagram drawio Figure: Proposed architecture diagram

christianmorfordwaitessw commented 2 years ago

CC: @pierssinclairssw @JackDevAU @tkapa

An updated version of the proposed solution of saving the history data to Cosmos Db

RulesHistoryData (3) drawio Figure: Updated proposed architecture diagram