TUM-IDP-WS-20 / doc

0 stars 0 forks source link

Initial research on data preprocessing techniques #12

Closed MelikeSila closed 3 years ago

MelikeSila commented 3 years ago

Parent: #1

TODO: find and document data preprocessing techniques

farukcankaya commented 3 years ago

Data preprocessing is a step in machine learning to prepare the raw data to make it suitable for building and training Machine Learning models. For example, in our case, raw data is the given thousands of PDF files on Accounting literature. Since we cannot use PDF files directly in well-known machine learning/NLP models, we need to organize those files to be used in standard tools. Therefore, we data pre-processing steps will be like below for us:

Each of the data preprocessing steps should be discussed before applying them to a task. The consequences of those steps might vary from task to task.