federatedbookkeeping / liquiddataspaces

Thoughts on Liquid Data
1 stars 0 forks source link

Plan for a prototype #4

Open michielbdejong opened 9 months ago

michielbdejong commented 9 months ago

I've started prototyping this several times, both in JS and in PHP:

Some of the most interesting snippets from that would be:

I am now in the luxurious position where I can start an unfunded software project for the coming years, and make it as big as I want it to be. I think the best place to start is dogfooding with the grooming of my own data downloads - at first manual, then automating it step-by-step, producing small reusable tools along the way.

michielbdejong commented 9 months ago

I could start by aggregating:

michielbdejong commented 9 months ago

Tools to develop immediately would include how to refer to data sources and maintain code that extracts information from them.

So for instance I should make notes of how I download data exports, and then have tools that:

  1. DOWNLOAD idempotently extract information from a download to form a data stream
  2. STREAMIFY link data from different streams together (e.g. a transaction that is visible in both the sending and the receiving bank account)
  3. TRANSLATE data formats
  4. CROSS-IDENTIFY entities, to allow me to add relation sources that for instance link a bank transaction to an invoice or a budget category, or a bank account number to a person, based on heuristics or a one-off rule
  5. FORWARD to any API or storage that might want to import a copy
michielbdejong commented 9 months ago

I don't need to dogfood GUIs! Because I will not be needing them myself anyway. Other people can do that. But I should dogfood Solid Data Modules!

michielbdejong commented 9 months ago

I could start with bank statements, or with GitHub issues for instance.

Should I make a copy of all downloaded information? Probably good, yes. See the 5 steps above: download, streamify, translate, cross-identify, forward

michielbdejong commented 9 months ago

It would be nice to see if I can stabilise my full personal data set, spend a full week only collecting, listing and scraping data sources, both the payloads and their metadata about how I obtained them, into one root index.

michielbdejong commented 9 months ago

I should distinguish between data I already have but whose source has dried up, and recurring data source which I should keep harvesting from on a regular basis; make a list of those.