This is the official repository for the Analyzing and Advancing Text Detection Tools for AI-Generated Text, a project created for a Master's Thesis in Computer Science at the IT University of Copenhagen.
The repository proposes a method for detecting AI-generated text.
Disclaimer: Does only work on english texts.
Some files are too large to be stored in the repository. This includes the models that can be used for generating calculations.
We propose using the link here to download the models and place them in the models
directory.
Notice the flag GENERATE_NEW_DATA
at line 11. If this is set to true the program will generate data again. This is a time-consuming process and should only be done if you want to generate new data.
When set to false it will reuse the data in the pickle files.
No AI detectors are perfect and this tool is no exception. The tool is based on the assumption that AI-generated text is different from human-generated text. This is not always the case as AI evolves and the tool might not be able to detect AI-generated text in all cases. It shouls also be noted that it works on english texts only.
An explanation of some of the files and folders:
data_processing/
: Contains various functions used for generating, processing and cleaning data.
extracted_answers/
: contains all answers from the datasets in .txt-filesfunctions
: contains javascript files used for cleaning, processing and generating the dataai output
: models/
: Contains the models used for generating calculationspickles/
: Contains the pickle files used for storing dataTest-Data/
: Contains the test data used for the projectDistanceMatrix.py
: LagrangeInterpolation.py
:lda.py
:logistic_regression.py
nucleus_sampling.py
: Contains the NucleausSampling class used for generating the Nucleaus Sampling modelpca.py
: Contains the PCA class used for generating the PCA modelperplexity
:RungeExample.py
:sampling.py
:simple_classifier
: The repository still contains the javascript file chatGPT-generator.js
which is only usable if you have a key for the OpenAI API. This key is not provided in the repository.