PiSchool / enterprise-document-classification

MIT License
8 stars 5 forks source link

Automatic Document Classification based on Image Analysis

This is a model for identifying the document type in an automated way (e.g. email, scientific publication, memo, etc). The model has been tested on the RVL-CDIP dataset, which is available at: http://www.cs.cmu.edu/~aharley/rvl-cdip/

Installation

To install the required libraries (tested on Ubuntu 17.11) run:

Classify documents

Training the model from scratch

  1. Prepare a dataset:
  1. Train the model:

Set the model parameters in AutoDocClass.py and run the script

In progress:

Author

This project was developed by Roberto Calandrini during Pi School's AI programme in Fall 2017.

photo of Roberto Calandrini