CarperAI / cheese

Used for adaptive human in the loop evaluation of language and embedding models.
MIT License
303 stars 24 forks source link

`instruct_hf_pipeline` example returns rankings of `None` #51

Open KastanDay opened 1 year ago

KastanDay commented 1 year ago

Describe the bug Printing the results in extract_data() here, the rankings are None. The rankings are also missing from the produced rankings_dataset file.

To Reproduce Steps to reproduce the behavior:

  1. Run default instruct_hf_pipeline, e.g. python -m examples.instruct_hf_pipeline

Expected behavior rankings should be a list of ints, corresponding to the human labeler's decisions.

Example buggy result (printing inside extract_data()):

{
  "query": "hat has inspired you to become a speaker? How important is your own English knowledge base to you",
  "completions": [
    "? So, how is a new speaker's grammar an essential tool in how you plan to speak?\n\nLangston is a student who writes English for all students, and so it is all about teaching the new speaker to think out loud. That is what he started doing two years ago when he learnt that his grammar was going to be different from that of their world renowned schoolwork teacher.\n",
    "?\n\nMe, and I am a fluent speaker. It is a privilege to be a speaker. Having some knowledge of English is important because while I get compliments for reading so many books I am already on a conversational train and I often find myself saying things I do not like in English. In order to achieve my own speech perfection, it can be hard to get my English to speak in English",
    "?\n\nYes, I speak more and I also like to communicate with other people in the community as a whole. I learned so much as a teenager living in Japan that I really don't understand what my Japanese does. But when I meet people, I just try and say English because I like speaking Japanese, I like seeing them on TV. I love hearing their opinions, even though they are ignorant",
    "?\n\nThe most important thing is learning to speak, even if it doesn't mean that much. The second person to do is to look at the problem and do something with it. The first person will look for something, and if possible, look at where it started. If you learn how to look for the first person in a sentence you can use the search function to find them, and if",
    "?\n\nThis question has been asked and answered with very clear answers as to simply what the speakers speak English. One can use this knowledge as a basis for a wide variety of topics, as in reading the book for five minutes, or as part of the job posting. For those who only need a few hours of reading a book a week, here is a brief discussion of a topic within English at"
  ],
  "rankings": null
}

The LMGenerationElement does not report an error, but is missing the rankings.

LMGenerationElement(client_id=1, trip=1, trip_start='client', trip_max=1, error=False, start_time=1674231730.971844, end_time=1674231740.1409464, query='Write a quote on the floor', 
completions=[...omit for brevity...], rankings=None)

If I solve the issue I'll comment here. Thx.

KastanDay commented 1 year ago

Solution: It appears the bug originates in the receive() function.

# bad:
task.rankings = pressed_vals
# good
task.data.rankings = pressed_vals

My suggested solution:

    def receive(self, *inp):
        _, task, pressed_vals = inp
        task.data.rankings = pressed_vals
        return task

The present() function uses this style, which may be preferable. data : LMGenerationElement = task.data, but seems too verbose in this case.