Hi,
I am currently trying to reevaluate the chatgpt based metrics (of Luo). I am wondering how you converted the answers to boolean decisions. I would imagine that especially the chain-of-thought template would result in responses that don't follow a pattern strictly.
As far as I see the paper of Luo is also not clear about how to convert this.
Have you been looking for keywords or is there any specific pattern you evaluated? Would also be nice if you could share the code.
Hi, I am currently trying to reevaluate the chatgpt based metrics (of Luo). I am wondering how you converted the answers to boolean decisions. I would imagine that especially the chain-of-thought template would result in responses that don't follow a pattern strictly. As far as I see the paper of Luo is also not clear about how to convert this. Have you been looking for keywords or is there any specific pattern you evaluated? Would also be nice if you could share the code.
kind regards