TabulaRazr

Extract and browse tabular data from legacy financial documents with ease.

This repository is a partial release from prior work and the Top 5 submission at DeveloperWeek 2016 (video presentation). The more elaborate version builds semantic links between tables to efficiently compare deals and aggregate otherwise disconnected knowledge from a large collection of documents.

Issues, forks and heavy usage welcome. Distributed under APGL v3.

Usage

After uploading a .txt or .pdf document, all identified tables are presented as well as where they occur in the document. View on Document The screenshot shows a bond used to construct public buildings in Jurupa's school district, Riverside County. Additional information, such as inferred data types and positional features of table cells are cached in .json files on the local filesystem.

Once the data is structured and annotated, it is relatively easy to automatically calculate domain specific key figures. This customized version includes an experimental calculation for the internal rate of return for Municpal Bonds. Often, auxiliary information is surfaced such as unemployment rates which again can be used as a basis to aggregate hidden knowledge.

Setup and run

Initial setup and run

npm install -g bower
pip install -r requirements.txt
bower install
python server.py

Navigate to http://localhost:7081 and upload an example document (see below). You may set your PORT variable to other ports than 7081.

Updating

git pull
pip install -r requirements.txt
bower install

Folder structure

/templates ... Jinja2 html templates
/static ... all stylesheets and media goes there
/static/ug/ ... user uploaded data and analysis files (graphs, json)

Example documents

One running instance with Municipal Bonds and other document categories lives at: http://tabularazr.eastus.cloudapp.azure.com:7081

Document	Category
Municipal Bond of the City of Flint: Debt Service Schedule	Municipal Bond
Deep Learning Paper: Empirical Findings	other
Annual Report Bosch 2014: Sales Figures	Business Report
Annual Report Oakland: Income per Sector from 2006 to 2010	(Business) Report
EY's Biotech Report 2015: Europe's Top IPOs in 2014	Business Report

ahirner / TabulaRazr-OS

readme

TabulaRazr

Usage

Setup and run

Initial setup and run

Updating

Folder structure

Example documents

Other documents

Example pdfs from public data (municipal bonds, audit reports, finanical reviews)

Works with XIRR calculation feature

Other documents that may be of interest: