Plan for a prototype - Githubissues

federatedbookkeeping / liquiddataspaces

Thoughts on Liquid Data

1 stars 0 forks source link

Plan for a prototype #4

Open michielbdejong opened 10 months ago

michielbdejong commented 10 months ago

I've started prototyping this several times, both in JS and in PHP:

Some of the most interesting snippets from that would be:

the algorithms for bank statement deduplication with lookalike groups
modelling the world instead of a single organisation in a "prejournal" way
the separation between statements and movements, to support both distributed versioning and contradiction

I am now in the luxurious position where I can start an unfunded software project for the coming years, and make it as big as I want it to be. I think the best place to start is dogfooding with the grooming of my own data downloads - at first manual, then automating it step-by-step, producing small reusable tools along the way.

michielbdejong commented 10 months ago

I could start by aggregating:

my bank statements and personal finance
my photos and digital memorabilia collections, including messaging logs
references of my physical possessions
task tracking and time tracking, including unfinished projects

michielbdejong commented 10 months ago

Tools to develop immediately would include how to refer to data sources and maintain code that extracts information from them.

So for instance I should make notes of how I download data exports, and then have tools that:

DOWNLOAD idempotently extract information from a download to form a data stream
STREAMIFY link data from different streams together (e.g. a transaction that is visible in both the sending and the receiving bank account)
TRANSLATE data formats
CROSS-IDENTIFY entities, to allow me to add relation sources that for instance link a bank transaction to an invoice or a budget category, or a bank account number to a person, based on heuristics or a one-off rule
FORWARD to any API or storage that might want to import a copy

michielbdejong commented 10 months ago

I don't need to dogfood GUIs! Because I will not be needing them myself anyway. Other people can do that. But I should dogfood Solid Data Modules!

michielbdejong commented 10 months ago

I could start with bank statements, or with GitHub issues for instance.

Should I make a copy of all downloaded information? Probably good, yes. See the 5 steps above: download, streamify, translate, cross-identify, forward

michielbdejong commented 10 months ago

It would be nice to see if I can stabilise my full personal data set, spend a full week only collecting, listing and scraping data sources, both the payloads and their metadata about how I obtained them, into one root index.

michielbdejong commented 10 months ago

I should distinguish between data I already have but whose source has dried up, and recurring data source which I should keep harvesting from on a regular basis; make a list of those.