This our dataset and code for the paper: Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text
The detector is a Roberta for classification model with labels (0: human, 1:ChatGPT-involved).
If you want to train it, follow these steps:
pip install -r requirements.txt
cd Detector
python train.py
It is also all right for you to change some setting in the code.
The best_model.pt
is the trained detector.
You can test the custom sample in text_test.txt (only three examples in it):
python inference.py
If you do not want to train the model, we provide our trained detector on HPPT: Trained Detector on Google driver and Trained Detector on Huggingface.
cd ../PR_reg
python train.py
We also provide the trained PR model: Trained PR model
You are welcome to use our dataset and models. For citation following BibTex entry:
@article{yang2023chatgpt,
title={Is chatgpt involved in texts? measure the polish ratio to detect chatgpt-generated text},
author={Yang, Lingyi and Jiang, Feng and Li, Haizhou},
journal={APSIPA Transactions on Signal and Information Processing},
volume={13},
number={2},
publisher={Now Publishers, Inc.}
}