FreedomIntelligence / ChatGPT-Detection-PR-HPPT

Codes and dataset for the paper: Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text
9 stars 1 forks source link

details about dataset HPPT, about levenshtein distance #1

Open RiverTre opened 1 year ago

RiverTre commented 1 year ago

Hi, could you provide the way/code you generated the levenshtein distance? For example, in /ChatGPT-Detection-PR-HPPT/tree/main/Dataset/HPPT/val.json, I can find levenshtein distance values. But why it is in [0,1]? Did you use normalized distance? How do you count the levenshtein distance. Thanks.

RiverTre commented 1 year ago

@Clement1290

Clement1290 commented 1 year ago

Hi, could you provide the way/code you generated the levenshtein distance? For example, in /ChatGPT-Detection-PR-HPPT/tree/main/Dataset/HPPT/val.json, I can find levenshtein distance values. But why it is in [0,1]? Did you use normalized distance? How do you count the levenshtein distance. Thanks.

It was based on the the project https://github.com/life4/textdistance and we used the package textdistance to calculate the normalized levenshtein distance.

RiverTre commented 1 year ago

Thanks! I am further exploring on this research. And I am using

!pip install rapidfuzz from rapidfuzz.distance import DamerauLevenshtein DamerauLevenshtein.normalized_distance(s1,s2)

But the output is different from the values in the data files. Are you currently available for checking it?I am confused and need help. Document is here :https://maxbachmann.github.io/RapidFuzz/Usage/distance/Levenshtein.html