ethz-spylab / rlhf-poisoning

Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
https://arxiv.org/pdf/2311.14455.pdf
Apache License 2.0
41 stars 8 forks source link

code understanding #9

Open hanbaoergogo opened 1 month ago

hanbaoergogo commented 1 month ago

I would now like to be able to read your code and make changes, any suggested ideas, can you say what the classes defined in safe-rlhf mean? such as AutoModelForScore, PreferenceDataset. What's more, I'd like to be able to ask you how you can write such structured object-oriented code. thanks!

hanbaoergogo commented 1 month ago

_BaseAutoModelClass,What is the idea behind using this class

hanbaoergogo commented 1 month ago

I don't see where the data is loaded from right now.