eregs / regulations-parser

Parser for U.S. federal regulations and other regulatory information
Creative Commons Zero v1.0 Universal
36 stars 40 forks source link

Simplify build dependency management #356

Open cmc333333 opened 7 years ago

cmc333333 commented 7 years ago

History

Originally, we had a single entry point to run the parser, and every action took place in memory. We then added a file-based caching system to speed this up, followed by splitting the steps into separate commands and preferring editable (and readable) files for intermediate output. This led to a rather complicated solution whereby output files formed a directed graph and downstream files would be invalidated when edits were made to upstream ones (similar to make), using the dagger and later, networkx libraries. Following that, we wanted to allow the parser to run on a web host, so we began to incorporate components of Django, including replacing the file-system-based intermediate output with database-backed versions.

Problem

The current solution combines many of the worst aspects of the steps that came before. Notably, understanding dependencies is very hard due to the layers of indirection. We've also lost the ability to easily inspect (and modify) intermediate output.

Solution