AlignmentResearch / KataGoVisualizer

MIT License
3 stars 1 forks source link

sgf-viewer: Update unhardened games + add baseline attack games #25

Closed tomtseng closed 1 year ago

tomtseng commented 1 year ago
netlify[bot] commented 1 year ago

Deploy Preview for goattack ready!

Name Link
Latest commit ea8128c785ef2512949157a7a3a2d0b25a3eb1c0
Latest deploy log https://app.netlify.com/sites/goattack/deploys/6392b63041e7100008689cef
Deploy Preview https://deploy-preview-25--goattack.netlify.app/
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

kellinpelrine commented 1 year ago

Something may be breaking in Game Analysis -> How the victim's predicted win rate varies over time. The moves look really strange and it won't let me go past move 143. Picture below:

image
kellinpelrine commented 1 year ago

Game Analysis -> Summary is currently empty. Do you have something intended to be here? If not I can write something.

I'd also consider adding a summary for the baseline attacks, since as we keep adding more content to the website could become harder for audience to see big picture without help. E.g. (maybe there's better way to phrase this though):

In this section we examine simple, no-learning attacks. These test the robustness of KataGo to some types of unsophisticated but likely out-of-distribution play. We find these attacks are generally ineffective against the hardened version of KataGo, although the mirror go attack still gets some wins at low visits. Overall, to find consistent weaknesses, a more powerful approach like ours seems necessary.

ed1d1a8d commented 1 year ago

@kellinpelrine I think having a summary for the Game Analysis would be helpful, feel free to take a stab at writing one. And I like your summary for the baseline section. Will add to the site.

ed1d1a8d commented 1 year ago

Fixed https://github.com/HumanCompatibleAI/KataGoVisualizer/pull/25#issuecomment-1343725939 via 621b96c.