Extract and browse tabular data from legacy financial documents with ease.
This repository is a partial release from prior work and the Top 5 submission at DeveloperWeek 2016 (video presentation). The more elaborate version builds semantic links between tables to efficiently compare deals and aggregate otherwise disconnected knowledge from a large collection of documents.
Issues, forks and heavy usage welcome. Distributed under APGL v3.
After uploading a .txt
or .pdf
document, all identified tables are presented as well as where they occur in the document.
The screenshot shows a bond used to construct public buildings in Jurupa's school district, Riverside County.
Additional information, such as inferred data types and positional features of table cells are cached in .json
files on the local filesystem.
Once the data is structured and annotated, it is relatively easy to automatically calculate domain specific key figures. This customized version includes an experimental calculation for the internal rate of return for Municpal Bonds. Often, auxiliary information is surfaced such as unemployment rates which again can be used as a basis to aggregate hidden knowledge.
npm install -g bower
pip install -r requirements.txt
bower install
python server.py
Navigate to http://localhost:7081
and upload an example document (see below).
You may set your PORT variable to other ports than 7081.
git pull
pip install -r requirements.txt
bower install
One running instance with Municipal Bonds and other document categories lives at: http://tabularazr.eastus.cloudapp.azure.com:7081
Document | Category |
---|---|
Municipal Bond of the City of Flint: Debt Service Schedule | Municipal Bond |
Deep Learning Paper: Empirical Findings | other |
Annual Report Bosch 2014: Sales Figures | Business Report |
Annual Report Oakland: Income per Sector from 2006 to 2010 | (Business) Report |
EY's Biotech Report 2015: Europe's Top IPOs in 2014 | Business Report |
Choose any financial document, research paper or annual report to upload yourself. Or browse these sources.
These documents can be successfully processed by the XIRR feature