Open piers-sinclair opened 3 years ago
Hey all, @william-liebenberg and I are on the same page for this - there shouldn't be a history.json file. All the data should be in git already - just need to query it out. Preferably using GraphQL or as an extra step in the Gatsby build.
This would alleviate the need for the action step entirely.
@bradystroud @wicksipedia @william-liebenberg
Hi All,
As per my conversation with @pierssinclairssw, we tried several ways to query the data and eventually ended up using the History.json solution.
- Option 1 (Read the modified file details from the GitHub repo files) At first, we intended to use a Gatsby plugin to get the file information such as https://www.gatsbyjs.com/plugins/gatsby-source-filesystem. This did not work because we have multiple GitHub Repositories for SSW.Rules (SSW.Rules + SSW.Rules.Content) and the plugin was unable to read the SSW.Rules.Conent repo.
- Option 2 (Use the GitHub API to get the modified details) Using the GitHub API was the next plan as it contained all the data we needed. Unfortunately, we hit API rate limits (5000 points/hour) and it was very slow as we needed to recalculate them for every rule. https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit
- Option 3 (Use a History.json file to store the modified details) This was the last option we tried and was similar to this solution https://github.com/joshuatz/git-date-extractor
I think we still need some form of caching so we only need to get the date timestamps/authors on updated rules, rather than refreshing the data for all the rules.
-Christian
As per my conversation with @pierssinclairssw @taineriley1, here are the options we have come up with. We don't think any of these are great.
Option 1 - On load - Retrieve data using our own API
Option 2 - On load - Retrieve data with GitHub API (same as above but no cache)
Option 3 - On build- PS script on build
Option 4 - On build- Store history in Azure Blob/Table/CDN Storage
Option 5 - On build- Have a metadata file for each rule
cc: @adamcogan @christianmorfordwaitessw @william-liebenberg
As per my conversation with @wicksipedia, @pierssinclairssw and @taineriley1, we have decided not to use any of the above solutions.
This is why the above options are not ideal: Option 1 and 2 - On load - Cache management is can be difficult and hard to maintain. Storage options could be expensive if user load is high e.g. CosmosDB Option 3 - On build - PowerShell script it too slow (~30min) Option 4 - On build - Could have race conditions. Will add more complexity Option 5 - On build - Will not solve the GitHub action problem
Instead we are going to try these alternatives.
gatsby-source-git
from local folder on build servergatsby-source-git
to the local cloneAs per my conversation with @pierssinclairssw, we have found that alternative 1 doesn't work because you can't run git commands across repos.
Here is an updated version of Alternative 1.
SSW.Rules/.git
must be deleted).git
foldergatsby-source-git
on a local folder within the repo on the build server.We have not investigated Alternative 2 much, but we would prefer alternative 1 because even if we can increase the efficiency of alternative 2, it may not scale well and it also requires manual folder manipulation in the repo on the build server.
you will need to add multiple checkout steps to the pipeline to get the other repo
https://docs.microsoft.com/en-us/azure/devops/pipelines/repos/multi-repo-checkout?view=azure-devops
Fixing the SSW Rules history data problem would have better ROI than fixing all the problems it creates. I used to spend ~16 hours a month fixing merge conflicts, helping rules editors with failing PRs, fixing empty history data on rules and more. These are all problems caused by the existing history data solution.
I estimate it would take ~32hrs to fix the history data
CC: @bradystroud @wicksipedia @jakebayliss @adamcogan @christianmorfordwaitessw @JackDevAU
As per my conversation with Adam, I have been getting lots of calls every week from users thinking they have done something wrong because the action fails.
We should treat this problem as the number 1 priority to be worked on next for Rules.
Hey! 👋 Just an FYI - I had this problem with a PR https://github.com/SSWConsulting/SSW.Rules.Content/pull/1697
CC: @bradystroud @wicksipedia @jakebayliss @adamcogan @christianmorfordwaitessw @JackDevAU @pierssinclairssw @taineriley1 @JackLeerson
(Checked by @bradystroud) Hi All,
As we all agree - keeping history.json
in the git history is a terrible solution:
Remove the history.json from git history, and generate it on demand from a cache via the deploy pipeline
Sync History | Commit Hash |
---|---|
7929f45... |
(This will only be one row, as a pointer to where we have synced to)
Rule History Cache | Mardown File Path | Changed By Display Name | Changed At Date Time | GitHub Username |
---|---|---|---|---|
address-formatting/rule.md | Adam Cogan | 2009-02-15T00:00:00Z | adamcogan | |
... | ... | ... | ... |
Create the following Azure Functions:
Sync (this will be the first part of the deploy pipeline
Cache from the latest hash -> HEAD on master
git log
from the hash -> HEAD, and find all files changedFetch All
Add a build step to call the azure function, and save as the history.json file - this mean no changes are required to for the Gatsby build step.
Figure: Proposed architecture diagram
CC: @pierssinclairssw @JackDevAU @tkapa
An updated version of the proposed solution of saving the history data to Cosmos Db
Figure: Updated proposed architecture diagram
@bradystroud @william-liebenberg @wicksipedia
Sometimes users are not set as the last modified by for a rule even though they just modified it. This issue occurs because users aren't forced to have GitHub Actions enabled on their forks. Additionally, created date is not involved in this process, but it should be.
To solve these issues
AB#61012