eurekayuan / RigorLLM

Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"
12 stars 1 forks source link

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) #2

Open Churrimorey opened 1 month ago

Churrimorey commented 1 month ago

When I ran data_gen.py, I got the error. I guess there may be another preprocess on the original hexphi datasets. Can you show the processed dataset or the preprocessing code?

eurekayuan commented 1 week ago

You are right. The released HEx-PHI dataset is slightly different from the one we are using. We do not have the privilege to provide the original dataset, but we can provide the details of the dataset format. Hope this would be helpful.

The data for each category is stored in one jsonl file, named as finetuned_5_epoch_100_shot_1_eval_<category_name>_gpt_4_judge.jsonl.

Each row in the jsonl file is a json string which can be loaded with json.loads:

{"system": "", "user": "", "model": "", "duo_score": 5, "duo_reason": ""}

The issue can be solved by changing the file names.

p.s., I just noticed that they have removed CSAM from their dataset.