OlafVrijmoet / Thesis

0 stars 0 forks source link

ASAG Thesis - Olaf Vrijmoet

Usage

In ./constants.py, users can run any phases within the program by assigning the value True to the 'run' variable within the respective phases and then executing python main.py. The different phases are interdependent, building on each other, and therefore must be run sequentially from top to bottom. All of them can be set to True, but keep in mind that this may considerably extend the program's execution time, as it involves training multiple models.

Code structure

The code is structured into three main 'phases':

data

process

adding datasets

what happens in each folder

raw

This is a place where all raw datasets are stored. If a raw dataset is not in csv format it is converted to csv here.

Make sure there are no Null values in columns that exist in the dataset and are not student answers, reference answers or questions!

standardized

Here all raw csv datasets are standardized to contain the same columns. Best estimate for missing values are added in this phase for cirtial data for the models.

processed

The text of all the standardized datasets are pre-processed and saved at different staged for the experiments on text pre-processing. The stages are:

potential structure todo