StampyAI / alignment-research-dataset

Stampy's copy of Alignment Research Dataset scraper
https://huggingface.co/datasets/StampyAI/alignment-research-dataset
MIT License
8 stars 7 forks source link

agentmodels working urls by using github urls when websites ones are broken #184

Open Thomas-Lemoine opened 12 months ago

Thomas-Lemoine commented 12 months ago

https://discord.com/channels/677546901339504640/1125882422731472896/1150201712775282840 https://discord.com/channels/677546901339504640/1125882422731472896/1150204154388688967

These discord messages mostly explain the issue. some of the articles with source 'agentmodels' have a url that isn't valid, since the github https://github.com/agentmodels/agentmodels.org/tree/gh-pages doesn't have the same exact chapters as the website, https://agentmodels.org/.

Moreover, in the ARD-browser, the agentmodels articles all have the title "Modeling Agents with Probabilistic Programs", and as mentioned by Henri here #62 , it might be better for the titles to be something like 'Modeling Agents with Probabilistic Programs - Chapter 1: Introduction'

Thomas-Lemoine commented 12 months ago

It might not work to use the github link for chapter 6 in particular. Looks like chapter 6 is just not on github nor the agentmodels website, but is stored on the ard.

Thomas-Lemoine commented 12 months ago

In fact, it seems like agentmodels.org was added as a submodule in data/raw, but I can't open it. Not sure what that means. image

ccstan99 commented 12 months ago

Looks like it got lost in the shuffle somehow someday... data/raw is no longer in the ARD repo but on gdrive so the location in code needs to be updated to look in gdrive.