andersbandt / Financial-Analyzer

Analyzes spending, budget, and investment data in a CLI
GNU General Public License v3.0
16 stars 4 forks source link

Database structure proposal #20

Closed nathan-hello closed 1 year ago

nathan-hello commented 1 year ago

I made this mockup today, mostly while at work. Merry Christmas.

https://dbdiagram.io/d/63a904107d39e42284e77729

This is a map of below. I haven't worked well with complicated relationships like what I'm about to explain, so apologies if the lines are wrong. I hope you get the idea. Also the types aren't meaningful in the picture, I just picked something.

This program, like most programs definitely fits into the "Create/View/Modify" structure of programming. We create the data, view it, and modify it. This is about what there is to create, which if you think about it is the entire mission statement of the program. Nobody asked for me to do this, but I thought hawk's structure needed some tweaking, and while doing that I thought of some ideas.

SQL Organization

user

Purpose: Identify users. Users should be able to share categories with other users.

category

Purpose: Group accounts together for frontend visualization and for backend data visualization/budgeting calculations. There should be a default category that accounts are placed in if they are not associated with a category or if their category is deleted.

account

Purpose: Accounts are used for representing their balance, understanding where a ledger_entry came from, and most importantly, point the overall program to the algorithm that understands their monthly statement (or however we're extracting information)

Users should be able to share their account with another user.

ledger_entry

One row in this table represents one transaction from an account. This should be the table into which all sorted files are pushed. Investments could potentially have their own table, but I'm not sure if that is justified. Someone might want to classify something as an investment that isn't obvious to any scripts we write.

spending_limit

One spending_limit entry represents one category to which it applies. There is a toggle for whether the budget should apply to that category's children. If a category has multiple inherited budgets, they can be resolved by making all of the limitations apply. Savings goals are not a part of this table; they are a column in the "account" table.

Alternatively, we could resolve budget conflicts in some interesting way. This could be where a budgeting scripting language could exist. There is very interesting potential here, I think.

andersbandt commented 1 year ago

Wow. Just wow. I appreciate all you effort on this writeup.

I think you did a good job of capturing my intents with the program. I have a few comments on this that I will get around too later.

andersbandt commented 1 year ago

Haha I feel very compelled to respond to this awesome write up so here's my comments :)

Again, great job on this.

nathan-hello commented 1 year ago

adding one category we will have to traverse up the tree and update the database for each category

Imagine we want to click on HEALTH and see its spending and the spending of all of its children separately. Or, we want to "collapse" a category in the UI. It makes sense to me to track both ends of the relationships because sometimes we'll want to do things from the bottom up, and other times we'll want to do things from the top down.

I would really like to track balance history which would require a ledger of balances

Agreed

Slightly confused on the name and display_name for the ledger table

Here's how I imagine UX is going to be for setting up:

Here's a New York Times charge on my card: NYTimes*NYTimes disc 800-698-4637 NY If I got that prompt, I would put it under MEDIA, and rename it NYTimes

I'm also on the fence between combining investment data and regular transactional data

Yeah, I think so too. The table doesn't really help with balances or history, which an investment account needs. Should we have a investment_account and a investment_ledger_entry? Or can the investment accounts be tracked in the accounts table?

lordlabakdas commented 1 year ago

Awesome work, @nate-llc

I took the liberty to extend/modify on @nate-llc 's db design and it can be seen at https://dbdiagram.io/d/63a9de987d39e42284e79708

Some changes a summarizd below:

Of course, the above argument of handling it inside of code assumes that the children_list and the trees column is required by the front end only sporadically. If these fields are used frequently, better to handle it inside the database, potentially using a db trigger.

nathan-hello commented 1 year ago

I appreciate the kind words and discourse around this post. Thank you all.

trees column in my opinion can be implemented inside of code (via joins). If we add it inside of db, then we might have to insert/update multiple tables

I didn't realize this, but yes, if we were to import/export trees, then that would affect the entire category table for that user. My reservation around constructing the tree from the category rows is that I'm not sure how you would make this exportable.

Having this kind of functionality isn't something I expect we're going to be doing any time soon. The important thing is that we construct the db in such a way where we can add this later. We don't want the twitter "in order for people to edit tweets we have to recreate the entire database" problem.

You seem like a better authority than I on this. I would just ask for an explanation on how someone could have multiple trees (like profiles) that they can swap around to, export, etc, if the tree if generated by the categories composing it.

to achieve the connected_users functionality

Thank you. Something I didn't mention in the original post is that I assume there would need to be more points of security than what I described.

Ideally, we would want to have lookup tables for institution name, permissions, and currency

I'm not sure how having a lookup table for 1:1 attributes like these would be more useful. I'm very much open to being convinced.

nathan-hello commented 1 year ago

https://dbdiagram.io/d/63aa5a897d39e42284e7a17a

Thank you hawk and labakdas for the discourse.

Changes to outline:

I also moved the boxes around. I would have liked to make the link an edit link, but you'd have to pay for the site to do that. The site seems pretty robust, allowing you to export what you make in plain SQL, Postgres, etc, so it might be worth. Or we rewrite it using the SQLAlchemy ORM. I would imagine learning enough sql to make the chart 'perfect' and exporting it would be easier.

But yea, that's the next draft of the tables.

andersbandt commented 1 year ago

It's looking really good, thanks for making those updates. I love that website too, really really nice for collaborating on this. I think we are getting closer and closer to a final database structure. I agree that the keywords relationship is a hard one to deal with. I'll let this draft sit with me and jot my next thoughts down when I have my chance.

I appreciate you spearheading the effort to get this locked in. This definitely does need to be the first priority, as the rest of the application will depend heavily on the structure of this database.

lordlabakdas commented 1 year ago

@nate-llc Personally, I do not think transactions and investments need a keywords field. They get it from the category table.

I assume account_id foreign key is UUID, similar to the id field in accounts.

We dont need an export from dbdiagram. SQLAlchemy will create out tables for us. https://docs.sqlalchemy.org/en/14/core/metadata.html

lordlabakdas commented 1 year ago

If we want to be fancy, possibly use a ulid instead of uuid. https://github.com/ahawker/ulid

nathan-hello commented 1 year ago

I do not think transactions and investments need a keywords field. They get it from the category table.

Unless I'm missing something, I don't think we could recover the keyword for a transaction if its corresponding category were deleted? This is why I think it's better to assign the keywords to the transactions/investments and allow the categories to find them

lordlabakdas commented 1 year ago

Excellent point, @nate-llc

If we decide to move the keywords to transactions, how do we populate the keywords? Still based on a category?

nathan-hello commented 1 year ago

A keyword is just a vehicle for tagging transactions so you can put groups of transactions (the group defined as whatever keyword they have in common) into a category. If there's another use for them, or if this isn't at all what @andersbandt was thinking, I'm more than willing to listen, but this is the impression I got from the discord convo the other day

Using the program, a user throws in their statement with transactions and the program gives them a prompt for every transaction that isn't already in the db. They can either put a keyword onto it (and if it's a new keyword, they would have to put that keyword into a category) or put it in a category directly

Tags are useful because if someone wanted to move their tag streaming service from ENTERTAINMENT to SUBSCRIPTIONS, they would move all of that tag in one action

andersbandt commented 1 year ago

re: keywords discussion I agree with Nate's last comment in the Discord that keywords could take the form of some intermediate table between the categories and transaction in the ledger. Here are a couple reasons I think that will be best

  1. we are unsure of the exact form that automatic categorization through keywords will take place. I think that restructuring only the keywords table would be easier than possibly having to restructure our ledger or categories tables if we choose to include keyword information in them
  2. This will possibly solve the problem of categories being deleted. Data would be retained in the keywords table.

A sample table structure could be like below???

id UUID category_id UUID keyword_strings JSON

I'm curious to hear what Nate was thinking would be the advantages of this method as well. And anyone else's thoughts.

nathan-hello commented 1 year ago

Sorry for the delay, I was gonna respond in discord but my response got kinda long

relevant discord :

category: healthcare Category keywords: pharmacy, hospital Transaction keyword: Tylenol, 1 main st, mom and pop store

Trying to square this circle, what I've come to is that what you're calling "transaction keyword" here is data that I wouldn't consider algorithmically relevant. if we have transactions > keywords > categories in a many:few:1 format, which is what I'm proposing, we can't have what you're calling transaction keywords here.

I now understand your goal in having multiple "sub"-keywords. Multiple keywords on a transaction is a major shortcoming that I was willing to say "this is a limitation: one keyword per transaction". Also the goal of pre-loading keywords can't happen if they're attached to transactions, because then we would have to pre-load transactions, which for the time being doesn't make a lot of sense.

My issues with having keywords based on category still stands, I think. I think your solution would work but there are the following restrictions.

I need some help with my solution, however. I can't quite get my head around the goals of the multiple keywords and from what I do understand, I'm not sure how to map.

The following are all the goals that I understand there to be. If we disagree on the goals or if there's something I'm missing, that's the more important thing to be talking about

Here's what I've got:

Solution: A keywords table. I'm not sure how the original keywords table was mapped and used in the original program, I'm very much curious what hawk thought about before all of this. It seems like if we want to treat keywords as a middle ground, then it makes sense to have a table between transactions and categories.

https://dbdiagram.io/d/63af64bf7d39e42284e84137

This is the only map I could think of, and I've tried. There must be some contradiction in the goals that I haven't understood yet or I don't know sql well enough. If this doesn't seem worth to anyone else, I can go with keywords in the category table. I just don't see it

lordlabakdas commented 1 year ago

My thoughts: Allow for a user to append multiple keywords to a transaction. Question: How do we determine relevance? Should there be a singular keyword for categorization and the rest for helpful data in some other way? I tried thinking of a solution with this kind of structure and couldn't crack it. I'm very much open to discussing it though. To answer this question, this is how I view it. Categories have a set of keywords. If system category, comes with a set of pre-filled static keywords. If a custom category, user fills keywords during creation or after or both. Similarly, transaction have its own set of keywords which can have an intersection with category keywords in some cases. Transaction keywords can be manually added by the user, in which case we assume relevance or auto-populated by the system (maybe later, once we have a robust dataset of transactions and verified categories) possibly by using NLP or similar methods. Investments follow the same logic with having their own set of keywords.

This is where having multiple keywords is most confusing for me. In this sense, a keyword is used as a middle ground. How could we have multiple middle grounds? The resulting keywords is the union of the category keyword and transaction keyword. Ideally, this must be a unique combination of keywords but it might also be common to have overlap with other transactions.

If a category were deleted, its keywords are still attached to the transactions. True in the above case.

Ability to move all of one keyword to another category. Just as keywords are helpful for data entry, they should be helpful for organization after data has been collected. True in the above case.

Any default data should be treated the same way as user-generated data. There is a difference. Default data is shared amongst users and hence non-deletable. User defined data can be deleted as it is present only in the particular user view.

Any preloaded keywords would have to be on corresponding categories, so if you delete a category and its keywords weren't in another category, the keywords go too. I would say making undeletable categories isn't something that we can accept. Both because it's not a good UX and because it speaks to the difficulty on potentially changing it down the road. Keywords are present in both categories and transactions. If categories are deleted, transactions can be identified using transaction keywords.

Conflicts in a transaction having multiple categories find the same keyword would be expensive to prevent. Maybe a user wants the transaction to fill two separate categories. It's this kind of control over data processing that I think is missing from the finance apps out there. Again, keywords on transactions would almost be worse at this because of the one-keyword limitation It depends on how we handle conflicts. Either we make conflicts mutually exclusive or we dont duplicate categories and allow for duplicated keywords inside categories. This is a rule we add to the system.

I'm not sure how having different keywords for category and transactions will work. How does Tylenol map to pharmacy? As a parent and a child. Parent keyword (category keyword) is pharmacy has a child keyword (transaction keyword) Tylenol.

nathan-hello commented 1 year ago

It sounds like all of my concerns are accounted for. I would like to revisit the difference between system-generated content and user-generated content, but we can certainly table that for when we go to actually write it. I imagine what you're thinking of is good, and if hawk is okay with all of this, I say send it.

My only request is to do a final draft on the db map, and then we can get to work on actually writing it

andersbandt commented 1 year ago

Thanks for creating this issue on database structure @nate-llc . The drafts you and @lordlabakdas made were great, along with the great discussion. This really cemented the idea that working "from the bottom up" is the best approach to tackle this new iteration of the financial application.

nathan-hello commented 1 year ago

https://dbdiagram.io/d/63aa5a897d39e42284e7a17a

Potential final draft. I edited the db diagram to include the keywords table. There's the connection from transactions/investments to category id in case of manual sorting. Does this look good? Specifically with the many/one relationships. @lordlabakdas

lordlabakdas commented 1 year ago

Awesome, @nate-llc

We can also add additional fields such as created_at and updated_at but can do that as we work on the implementation of the models.

lordlabakdas commented 1 year ago

Will close this issue. Can reopen if there is additional inputs/questions from members