StampyAI / alignment-research-dataset

Stampy's copy of Alignment Research Dataset scraper
https://huggingface.co/datasets/StampyAI/alignment-research-dataset
MIT License
9 stars 7 forks source link

Fix titles #158

Open ccstan99 opened 1 year ago

ccstan99 commented 1 year ago

Strip newlines from titles. Also consider stripping non-alphanumeric chars from the title and "uncase" (all lowercase) when making hash_ids to catch more duplicates.

Thomas-Lemoine commented 1 year ago

so there are two ways where changes can be made: