question about StepDPOTrainer

Hi, thanks for your work! I have a small problem.

It seems that you have implemented a StepDPOTrainer class which inherits from trl DPOTrainer, and you have implemented a function 'tokenize_row'. However, DPOTrainer does not have the 'tokenize_row' function, it belongs to 'OnlineDPOTrainer', so I wonder whether the StepDPOTrainer is really used in your training.

This may not affect your result. But I am curious about your 'tokenize_row' function, what does this function do? Maybe you want to use OnlineDPOTrainer?

Best wishes!

dvlab-research / Step-DPO

question about StepDPOTrainer #18