SSWConsulting / SSW.Rules

Generator for ssw.com.au/rules
https://www.ssw.com.au/rules
MIT License
11 stars 12 forks source link

🔍 Meta Descriptions - Backfill all old rules with good descriptions #1307

Closed bradystroud closed 3 weeks ago

bradystroud commented 2 months ago

CC @bradystroud @JackDevAU @Aibono1225 @KristenHu Write a script that goes through all the rules and adds a meta description based off the rule content (tip: use AI to help speed this up)

[!WARNING] using ChatGPT could be expensive (need a desc for 3000+ rules), consider other options e.g. local LLM

### Tasks
- [x] Build a script
- [x] Test it on 10 rules
- [x] Get descriptions approved by marketing person
- [x] Then run on all the rules
- [x] Get Tiago to check them every time he looks at a rule

Even though this will only be run once, store the script in the repo.

As per my conversation with @JackDevAU and @Aibono1225 we considered doing this 10 rules at a time, but this would take too long. Since this is urgent, it is better to ship the rules with unchecked generated descriptions and refine them over time.

JackDevAU commented 1 month ago

As per my conversation with @bradystroud we are going to wait until the Website team completes this first: https://github.com/SSWConsulting/SSW.Website/issues/2594

bradystroud commented 1 month ago

Update:

Found a cool solution https://www.youtube.com/watch?v=e4V-heTEpE8 (11 min)

I'm still working on getting it perfect :)

bradystroud commented 4 weeks ago

Update:

Picked this up again today.

Script is being pushed to this branch https://github.com/SSWConsulting/SSW.Rules.Content/tree/seo-descriptions

I added some code to check the description after generating to ensure its not terrible

Issue Description Explanation
Exceeds 300 characters 150 chars is recommended, but that is too hard for an AI to follow because it cant count
Contains the phrase 'Here is the ...' The AI sometimes adds "here is the description i've generated for you"
Contains 'I've generated' Similar to above, catching "I've generated," which is a formality from the AI.
Contains odd characters * or _ Odd characters (normally markdown syntax like asterisks (*) or underscores (_)

If the rule has issues its added to a log file.

All the rules in the log file will need to be dealt with later.

bradystroud commented 4 weeks ago

This morning i merged in the changes to 3000+ rules at once

Rules has a build step that makes a copy of this history, this started failing due to the 3000+ files changed. I tried to resolve this by skipping that build step in #1365 but that caused more problems.

I need to undo the commit that added the changes, then submit the changes in chunks (100 rules at a time)

bradystroud commented 3 weeks ago

Update - this is taking longer than expected due to a few new issues

1368 #1367

bradystroud commented 3 weeks ago

🥳 Done! (mostly)

I have shipped 3,150 rules with descriptions - the remaining ones are rules the AI struggled to generate a description for. I have moved these to a new issue #1378

https://github.com/SSWConsulting/SSW.Rules.Content/blob/main/scripts/generateSeoDescriptions/seo_issues.log

Image Figure: ChatGPT rule has a generated description

I have also created a new issue to do the same for categories #1377