Possible bug in rank_datasets

Hi,

I am working on running RM training with WebGPT data.

I think I found a tiny bug (I am sorry if this is a duplicate of some other issue)

In https://github.com/LAION-AI/Open-Assistant/blob/8a461b9d7b290593af725b66398146049febfa49/model/reward/instructor/rank_datasets.py#L104

if question not in self.index2question gives always True because question is a string and keys of index2question is integers. It should be something like

for i, row in enumerate(dataset):
    ...
    if i not in selt.index2question:
        ....

I hope it helps.

LAION-AI / Open-Assistant

Possible bug in rank_datasets #2210