agrc / sweeper

🧹A cli tool for making data good 🧹
MIT License
4 stars 3 forks source link

Profanity check SGID data #107

Open gregbunce opened 1 year ago

gregbunce commented 1 year ago

it would be helpful to have a function that looks through the data names and scans for derogatory names in the data - think trailheads, trail names, place names, etc.. This could be a good opportunity to leverage AI.

gregbunce commented 1 year ago

FYI: we do have a derogatory name in the trailheads data - it's a former name. I'm working on this now to clean it up.

gregbunce commented 1 year ago

a possible solution to look at: https://github.com/surge-ai/profanity

steveoh commented 1 year ago

I think we'd probably stick to gcp or maybe aws.

https://cloud.google.com/natural-language/docs/moderating-text

gregbunce commented 7 months ago

Moving this FY25 Q1 and hopefully things will settle down a bit by then to make some progress on it.