ahirner / TabulaRazr-OS

Extract tabular data and semantically discover it with ease! (OS)
GNU Affero General Public License v3.0
21 stars 3 forks source link

TabulaRazr

Extract and browse tabular data from legacy financial documents with ease.

This repository is a partial release from prior work and the Top 5 submission at DeveloperWeek 2016 (video presentation). The more elaborate version builds semantic links between tables to efficiently compare deals and aggregate otherwise disconnected knowledge from a large collection of documents.

Issues, forks and heavy usage welcome. Distributed under APGL v3.

Usage

After uploading a .txt or .pdf document, all identified tables are presented as well as where they occur in the document. View on Document The screenshot shows a bond used to construct public buildings in Jurupa's school district, Riverside County. Additional information, such as inferred data types and positional features of table cells are cached in .json files on the local filesystem.

Once the data is structured and annotated, it is relatively easy to automatically calculate domain specific key figures. This customized version includes an experimental calculation for the internal rate of return for Municpal Bonds. Often, auxiliary information is surfaced such as unemployment rates which again can be used as a basis to aggregate hidden knowledge.

Setup and run

Initial setup and run

npm install -g bower
pip install -r requirements.txt
bower install
python server.py

Navigate to http://localhost:7081 and upload an example document (see below). You may set your PORT variable to other ports than 7081.

Updating

git pull
pip install -r requirements.txt
bower install

Folder structure

Example documents

One running instance with Municipal Bonds and other document categories lives at: http://tabularazr.eastus.cloudapp.azure.com:7081

Document Category
Municipal Bond of the City of Flint: Debt Service Schedule Municipal Bond
Deep Learning Paper: Empirical Findings other
Annual Report Bosch 2014: Sales Figures Business Report
Annual Report Oakland: Income per Sector from 2006 to 2010 (Business) Report
EY's Biotech Report 2015: Europe's Top IPOs in 2014 Business Report

Other documents

Choose any financial document, research paper or annual report to upload yourself. Or browse these sources.

Example pdfs from public data (municipal bonds, audit reports, finanical reviews)

Works with XIRR calculation feature

These documents can be successfully processed by the XIRR feature

Other documents that may be of interest: