Database buildout / Suggestions

xantari commented 2 years ago

Hi Fraesh,

I have a private repo I can grant you access to that has daily updates from all the major sites and it outputs it into a .json format I created a few years back. Would probably be an easier global view of changes across sites that you can then use to build out your database entries. Structures a little bit different as I wanted to support backwards and forward linking and is what ultimately I used in my Pincab.Configurator, so I can easily see what was new since I last turned on my cabinet so I can update tables or install new tables.

My databases auto merge in IPDB info that is missing from the source site, have the descriptions, authors, from the source site as well.

It also addresses the issue I see in this database structure like the merging of multiple separate AC/DC tables. I originally went down a structure similar to what you did, but realized a flat, node linking structure was way more flexible where you could define a single wheel image for instance, and link it to 5 different versions of a table (such as AC/DC). Ultimately this structure also allows for reverse node linking as well, because then a table can know how to link back to the wheel as well if you do pre-parsing of the forward node directions, and then do reverse node linking by unique file URL so can get all related media going from the wheel down, or say from the table view down.

Pincab.Configurator: https://github.com/xantari/PinCab.Configurator (you can look at the database manager, which if you point it to multiple database .json files auto-merges everything together and does the forward/backwards linking upon load)

Data models: https://github.com/xantari/VirtualPinball.Database.Models

VPS Converted spreadsheet: https://github.com/xantari/VPS.Database (this one is based off of the spreadsheet and auto-syncs daily)

Fraesh commented 2 years ago

Hey again :P

THat sounds awsome! Are you scraping the website with a cron job? i was thinking about that aswell since the manual upkeep is quite time extensive. If so, could I access that file? That could make life a whole lot easier, maybe we can align on something.

As for the datastructure, while developing i messed around with a whole lot of approaches finding the right one to be practical aswell as minimum cost. thats where the nested structure came in. i prefer flat myself but this style reduced firebase calls to a minimum back when i used firestore, with this github based approach i could switch back to flat though. It will however make in-browser sorting & filetring far more expensive so im still not sure if it is really worth it.

The merging will also still be an issue since its basically a human decision as to when i declare table as one or not. Adding the inheritance prop though to link to a parent table will clear that up though hopefully

xantari commented 2 years ago

Added you to the repo, use the second invite. It appears I can't set the repo to read-only mode on user repositories, so please don't write to it.

It's "website parsing" :-) Scraping seems to have negative connotations to it (even though its what google, bing, etc does everyday to websites).

It's basically a really light weight program that reaches to VPF and VPU, and just checks whats new, takes about 10 seconds to complete and makes very few HTTP calls. It was written this way so it's much more efficient then how google/bing crawl the whole damn site and causes way more burden on the web server.

Do you have a discord I can reach out to? Perhaps we can collaborate on the project as I do a lot of web dev (just haven't touched React yet, but have a basic knowledge of Angular SPA framework)

Fraesh commented 2 years ago

thx! that sounds awesome :) Im on discord alright, you can find me in the big Virtual Pinball Chat or just as Fraesh. The frontend repo is still private atm as it has the whole firebase config for authentication and is tied to my firebase plan, running fine on free tier but ye. I can make it "shareable" and hide the stuff.

That "parser" looks great! That would be a really awesome addition for the future, would love to collab on that :)

Ill hit you up if get the frontend in a shareable state if you wanna mess around a bit. Good simple thing to learn some react if youre new to it. Bit of a switch from angular though :D

stojy commented 2 years ago

hi @xantari

The 'last updated' scenario is definitely a good use case. I've implemented it ClrVpin project and thus far it's proven to be a very useful to locate the tables that have recently changed. Plus I've found adding an 'exclude originals' is rather useful for the VP 'purists'.

Regarding AC/DC (as an example), I suspect the existing hierarchcal model works fine as each AC/DC table is actually a unique table with it's own IPDB reference. It's just that the current feed only supports a single IPDB.

And your website parsing feature sounds very interesting. It'd be great if that were merged into this feed to help reduce the manual involvement.

xantari commented 2 years ago

Closing issue…

Fraesh / vps-db

Database buildout / Suggestions #3