Gaurav-Pande / AES_DL

Automated Essay Scoring using BERT
http://www.gauravpande.in/AES/
45 stars 15 forks source link
aes aes-dl automated-essay-scoring bert-embeddings bert-model bias natural-language-processing nlp python word2vec

AES_DL

The code in this section is exploration of different deep learning techniques on individual sets and on whole dataset.

Little bit about Data:

we used the Auto-mated Student Assessment Prize (ASAP) datasetby The Hewlett Foundation. (Hewlett, 2012: ac-cessed March 12, 2020) This dataset consists ofessays written by students from 7th - 10th grade.The essays are divided into 8 sets. Each set hasa prompt associated with it. There are 2 types ofprompt Type 1: Persuasive / Narrative / ExpositoryType 2: Source Dependent Responses. The firsttype of prompt asks students to state their opinionabout certain topic. The second type of prompthas a required reading associated with it and thestudents are expected to answer a question basedon their understanding of this reading. Differentprompts have been graded by different number ofgraders. But each set has a domain 1 score, which

So what did we try:

The approaches tried in DL are:

  1. Try 3 different architecture involving LSTM, BiLSTM, and CNNs on individual sets and on whole dataset separately.
  2. Use Word2vec and Bert embeddings for feature vector representation.
  3. Hyperparameter tunning to optimize the loss and increase the mean QWK.

Currently the models were trained in keras(tensorflow as backend).

Prerequisites

Installation

I would recommend using google collab or better if you have GPU access. If you are running this locally then follow the instructions:

pip install virtualenv

virtualenv aes

source aes/bin/activate

pip install -r requirements.txt
Training the models

Using BERT and train on per set, run:

python train_bert_sets.py
python train_bert_all.py

python train_word2vec_all.py

Note:

[Future Work]: