GiovanniPioDelvecchio / NLP-Project

This is the repository for the group DP(G)R dedicated to the project for the exam of NLP, which will consist in a model for prediction of values associated to argumentations.
0 stars 0 forks source link

Dataset Preprocessing #2

Closed GiovanniPioDelvecchio closed 1 year ago

GiovanniPioDelvecchio commented 1 year ago

It is necessary to implement a preprocessing pipeline in order to obtain dataset that can be fed to a NN model which will classify arguments with value labels. To do so it is necessary to choose an appropropriate tokenizer (which will include the treatment of OOV terms) and eventually clean data by considering the insights from the dataset description in the paper.