Digital-Defiance / OpenBook

Enhance organizational transparency by indexing structured Markdown files into Excel & API queryable views, enabling detailed tracking such as membership or grant usage through transactional records, while maintaining data in a version-controlled, queryable system.
MIT License
0 stars 1 forks source link
data-transformation excel git markdown mongo node

OpenBook

OpenBook 1@4x

Note: OpenBook is in beta, is incomplete, and undergoing development. (See #Complete)


OpenBook is an experimental git versioned markdown filesystem to mongo orchestrator. Rather than re-inventing the wheel, we opted to use MongoDB to store the data, and this is a service that uses git to determine updates needed for the mongodb. This little service can currently be run on a cron or manually to sync up the data. In the future it will support webhooks to automatically update the data.

It is intended to take data from a human readable repository in markdown format (maximum, single folder level depth) in a mostly human-readable format. It is not designed for huge databases- is intended for relatively small (hundreds or thousands of records) databases, such as member lists, or other small datasets. It is limited by the filesystem, its speed, and the additional overhead of git and parsing markdown.

It is designed to be Markdown friendly, although users do need to be careful about the format of their data. Structure is essential in being able to query into the nested structure of the document. Structure is expected to be GitHub flavored markdown. Depending on the structure of the document, some lines may need to have two spaces at the end of the line to format correctly, which is according to GFM syntax.

Internally we use remark - npm (npmjs.com), which used to be called mdast, to convert the markdown into tables of queryable data from mongo. We store the entire parsed root node, and we do a further stage of indexing to flatten the nodes for easier querying.

Once imported, our Node express API performs some essential queries and functions on the mongo data. Data is easily moved between markdown, json, html, and it can also output Excel files from a feature we call Views.

Here is an example Excel file generated from the raw markdown entries in our [https://github.com/Digital-Defiance/Digital-Defiance/tree/main/Public%20Data/2023%20Cash%20Flow](Digital-Defiance repo). image

Rationale

As a non-profit, most of what Digital Defiance does needs to be public. OpenBook is a way to store data in a way that is version controlled, human-readable, and whose records can also be managed and added to by humans. Examples would include member lists and possibly financial transactions.

OpenBook seeks to be a way for similar organizations to manage their membership records and other data.

Digital Defiance seeks to use OpenBook to store and display data for our membership, grants, donations, and a running expense sheet to start with. A front end, likely at transparency.digitaldefiance.org, will be created to expose the data in its various forms, search and display records.

Mechanism of action

Format

Largely GitHub Flavored Markdown (GFM)

OpenBook will parse the markdown and respond to queries in a way that is similar to a database. Most of the functionality will be in querying and parsing the data.

See Record Format for more information on the format.

Directory Structure

Directory structure is critical to the functionality of OpenBook. Each table is housed within its own directory, and each record is a markdown file within that directory. Tables/directories can not be nested at this time. Further nesting is achieved within the markdown itself.

See Directory Structure for more information on the directory structure.

GitDB Database Setup

Copy .env.example to .env and fill in the values. There are options to locate the date at a subdirectory within each of the repositories.

Developing/Running (Windows 11)

Tasks

Complete

In Progress

TODO

Authors, License

This work is provided under an MIT license by the Digital Defiance Contributors.

The lead architect is Jessica Mulein.