LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
37.06k stars 3.24k forks source link

Possible bug in rank_datasets #2210

Closed ghtaro closed 1 year ago

ghtaro commented 1 year ago

Hi,

I am working on running RM training with WebGPT data.

I think I found a tiny bug (I am sorry if this is a duplicate of some other issue)

In https://github.com/LAION-AI/Open-Assistant/blob/8a461b9d7b290593af725b66398146049febfa49/model/reward/instructor/rank_datasets.py#L104

if question not in self.index2question gives always True because question is a string and keys of index2question is integers. It should be something like

for i, row in enumerate(dataset):
    ...
    if i not in selt.index2question:
        ....

I hope it helps.

sanagno commented 1 year ago

@theblackcat102 good catch