fixmyrights / discord-bot

🤖 Discord Bot for the Right to Repair community.
BSD 2-Clause "Simplified" License
4 stars 4 forks source link

Ideal database #13

Open finnbear opened 4 years ago

finnbear commented 4 years ago

Please discuss here.

ChadBailey commented 4 years ago

IMO, let's just start building as memory resident then if we find a compelling reason to move to a database let's drive that decision of which to choose off of the particular challenge that comes up

joaodforce commented 4 years ago

It cannot be memory resident, the tracked bills must be persistent, the Custom JSON option is the best for the medium term. Because it makes it very simple to host, we basically need to store a very small object array.

We also consider building a flag into code to limit it to one request per server, this way we resolver concurrency issues, and also reduce the load on the BOT.

ChadBailey commented 4 years ago

Sorry, I really just meant the comparisons being done against an object held in memory. I didn't mean to not dump that to a .json file.

So, the way I was suggesting is: [new data] -> add to memory -> dump memory to .json file. on initial load -> load .json file to memory

joaodforce commented 4 years ago

Sorry, I really just meant the comparisons being done against an object held in memory. I didn't mean to not dump that to a .json file.

So, the way I was suggesting is: [new data] -> add to memory -> dump memory to .json file. on initial load -> load .json file to memory

Oh sorry I must had misread it, but still I was clearing up doubts about the other options.

We will test how it performs, we don't expect this bot to be very resource heavy, but if you look at the code this is basically how it is being done right now.

Loading the tracked bills from the file, checking/updating their status from the API, updating the file on disk, and then messaging the channels if there are updates.

ChadBailey commented 4 years ago

oh, i see... sorry i have not had a chance to get up to speed on the way it's working now. are there any challenges with the current implementation? any reason you might expect this to need to be changed soon?

joaodforce commented 4 years ago

oh, i see... sorry i have not had a chance to get up to speed on the way it's working now. are there any challenges with the current implementation? any reason you might expect this to need to be changed soon?

The first change we foresaw on this approach was concurrency, if like two request to update the file came at once, the resulting file would be written incorrectly.

With that problem in mind we first thought about requesting a lock for the file manipulation, but then we have an even better idea, to limit the requests to 1 per server. That way we keep the bot load down, and resolve the concurrency issues.

And given the scope of the project, there won't be a need for any database, because we are not looking for storing information and all the status changes from the bills. We plan to keep it concise and only stream the data as it changes.

ChadBailey commented 4 years ago

I see what you mean about the concurrency problem... but that's actually an issue that extends beyond merely writing to file - that issue exists before the data touches the physical storage layer. said differently, if you have a concurrency problem when writing you probably have a concurrency bug in your code.

Probably the easiest way to address this is by implementing a simple FIFO queue such as this https://stackoverflow.com/questions/1590247/how-do-you-implement-a-stack-and-a-queue-in-javascript

Sorry I'm commenting before actually analyzing the code, i just felt that was an important thing to point out since it may impact your current architecture.

caldane commented 4 years ago

Why is a DB layer not viable at this point? I am not sure I understand that part of the equation and if I understood why it isn't viable then I would better understand the problem.

finnbear commented 4 years ago

There is something nice about the entire database/config being stored in a human-readable JSON file.

ghost commented 4 years ago

As much as I love/prefer working with traditional SQL, the only benefit I see to using it would be the aforementioned solution to concurrency problems. Not all concurrency issues are bugs; if they were, enterprise storage arrays wouldn't exist. Anyway, there is very little opportunity to de-duplicate fields within the JSON file. We would get some savings from the states (breaking that into its own table; I mean MA, TX, TN, etc), and a lot of savings from not having the same keywords 1,000 times over and over. But we could get around that problem far more easily by just passing the resulting JSON file through LZMA before we write it to disk and we'd get nearly the same space savings as we'd get from a normal database.

If we don't already, it would be a good idea to keep a separate database file (JSON or otherwise) of timers for when to refresh API objects. The API doesn't provide a cache timer within the JSON result (or at least I didn't see it when I checked) so we would need to write down our own 'cached until' fields so we'd know when to re-check for data.

TheDevMinerTV commented 4 years ago

I would recommend LokiJS because it saves the data as regular JSON files, is synchronous, supports "collections", lightweight and easy to use.

caldane commented 4 years ago

I get that it being human readable is nice, but I have been told it is not viable. Document DB's are just as readable as JSON files, so that doesn't seem to be a valid reason to stay away from them. And double-clicking a file vs double-clicking a collection does seem to add complexity either. So if I understood the reason that DB's are not a viable solution I think I could help out with coming up with a solution.