Detoxification inference returns error

BunnyNoBugs commented 1 year ago

Hello!

I was testing the library for detoxification, particularly your Inference from pretrained prompt (RUSSE Detox) tutorial. For me, inference returns a KeyError in Colab:

Maybe the problem is with the specific version of datasets library needed? The notebook is written as if it was already installed, but I had to install it extra (the latest version).

konodyuk commented 1 year ago

Here the problem is related to the fact that the prompt format of the pretrained prompt "konodyuk/prompt_rugpt3large_detox_russe" is defined as <P*X>{toxic_comment}<P*X>. If it was <P*X>{text}<P*X>, then the code would work, because {text} is a default placeholder. For non-default interpolations you need to provide a dictionary containing the interpolation key. In this case the code should be

for i in tqdm(valid_dataset):
    options = ppln(
        i,
        ...

or

for i in tqdm(valid_dataset):
    options = ppln(
        {"toxic_comment": i["toxic_comment"]},
        ...

for more verbosity.

I think that I mistakenly uploaded a prompt checkpoint with the wrong format at some point, so it has nothing to do with datasets version.

Will keep this issue open to fix the notebook when I get a chance. Thank you for the report.

BunnyNoBugs commented 1 year ago

Thank you for your answer, now the inference works properly!

ai-forever / ru-prompts

Detoxification inference returns error #3