AI-generated text boundary detection with RoFT

Welcome to the official repository for the paper AI-generated text boundary detection with RoFT.

Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated. Detecting the boundary between human-written and machine-generated parts of such texts is a challenging problem that has not received much attention in literature. We attempt to bridge this gap and examine several ways to adapt state of the art artificial text detection classifiers to the boundary detection setting. We push all detectors to their limits, using the Real or Fake text benchmark that contains short texts on several topics and includes generations of various language models. We use this diversity to deeply examine the robustness of all detectors in cross-domain and cross-model settings to provide baselines and insights for future research. In particular, we find that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model; we also find which features of the text confuse boundary detection algorithms and negatively influence their performance in cross-domain settings.

In this repository you can find both datasets, used in the paper:

roft_duplicates_removed.csv - the dataset, introduced in the paper Real or Fake Text?: Investigating Human Ability to Detect Boundaries. Between Human-Written and Machine-Generated Text by Dugan et al., but duplicate entries (i.e. entries with the repeating prompts and generations) are removed by us. Don't forget to cite that original paper when using the dataset.
roft_chatgpt.csv - augmented version of RoFT dataset with chatgpt-3.5 generations (our modification).

There are also some example implementations of the algorithms, used in our paper:

roft_sliding_window_counting_PHD.ipynb - calculating PHD of an embeddings of a text sequences using a sliding window technique. It also contains the function for calculating MLE as an alternative.
roft_original_TimeSeriesSVR_classification_sliding_window_davinci.ipynb - applying timeseries SVM to the series, obtained with the algorithm above.
neg_log_likelihoods.ipynb — building classifiers on top of calculated negative log likelihoods of tokens in the text using LLMs. Note that perplexity.py script should be used first to calculate negative log likelihoods of tokens.

Please note that in our experiments, we derived the test sets from both datasets using train_test_split function from sklearn 1.4.1, used together with numpy 1.24.4:

train_df, test_df, y_train, y_test = train_test_split(df, df["label"].astype(int), test_size=.2, random_state=42),

where df is either roft_duplicates_removed or roft_chatgpt.

Cite the original paper where RoFT dataset was introduced (Real or Fake Text?: Investigating Human Ability to Detect Boundaries. Between Human-Written and Machine-Generated Text) as:

@inproceedings{10.1609/aaai.v37i11.26501,
author = {Dugan, Liam and Ippolito, Daphne and Kirubarajan, Arun and Shi, Sherry and Callison-Burch, Chris},
title = {Real or Fake Text? Investigating Human Ability to Detect Boundaries between Human-Written and Machine-Generated Text},
year = {2023},
isbn = {978-1-57735-880-0},
publisher = {AAAI Press},
url = {https://doi.org/10.1609/aaai.v37i11.26501},
doi = {10.1609/aaai.v37i11.26501},
abstract = {As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time. Furthermore, we conduct a detailed comparison study and analyze how a variety of variables (model size, decoding strategy, fine-tuning, prompt genre, etc.) affect human detection performance. Finally, we collect error annotations from our participants and use them to show that certain textual genres influence models to make different types of errors and that certain sentence-level features correlate highly with annotator selection. We release the RoFT dataset: a collection of over 21,000 human annotations paired with error classifications to encourage future work in human detection and evaluation of generated text.},
booktitle = {Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence},
articleno = {1432},
numpages = {9},
series = {AAAI'23/IAAI'23/EAAI'23}
}

Cite our paper (AI-generated text boundary detection with RoFT) as:

@misc{kushnareva2023aigenerated,
    title={AI-generated text boundary detection with RoFT},
    author={Laida Kushnareva and Tatiana Gaintseva and German Magai and Serguei Barannikov and Dmitry Abulkhanov and Kristian Kuznetsov and Eduard Tulchinskii and Irina Piontkovskaya and Sergey Nikolenko},
    year={2023},
    eprint={2311.08349},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

SilverSolver / ai_boundary_detection

readme

AI-generated text boundary detection with RoFT