MalloyDelacroix / DownloaderForReddit

The Downloader for Reddit is a GUI application with some advanced features to extract and download submitted content from reddit.
GNU General Public License v3.0
514 stars 48 forks source link

Migrating to sqlite support? #47

Closed crccheck closed 4 years ago

crccheck commented 5 years ago

I know this is on your todo list. I've put some thought into this too and I'm wondering how I can help. I don't really have time right now, but maybe if we can split it up into small chunks it'll be easier to do.

To ORM or not to ORM?

Should we use the stdlib sqlite3 package or an ORM?

Initial migration path

  1. Hard migrations – we have a one time migration. One and done. This would mean everything has to get converted to sqlite
  2. Gradual migration – migrate one model at a time. RedditObjects, Redditors, Redditor preferences, Subreddits, Subreddit preferences, system preferences, etc.

Maintenance

Deleting old data? Vacuums? Schema migrations?

MalloyDelacroix commented 5 years ago

This is something that I have been giving a lot of thought to lately also. The sqlite migration is a huge stepping stone to a lot of other features that are planned for the future.

One of those features is adding the ability to download and store comments and text posts. I haven't quite worked out how to structure the database yet, but I am thinking tables for users, subreddits, posts, and comments. Posts can be connected to either users or subreddits depending on which they were downloaded from. Comments can be connected to users if the download performed specifies to download the users comment history or connected to posts if the app user specifies that downloaded posts should also download any comments associated with the post.

I would also like to allow the app user the ability to search the database in ways which would make the data collected useful, such as comment or post score, posts with so many comments, keywords in text, etc, and probably export the data in different formats.

The steps that I see are:

  1. Making the database scheme
  2. Making DAO's to access the db and build the models
  3. Alter the current state handler class to load from the DAO's at startup
  4. Alter the object updater to migrate the marshaled objects to sqlite
  5. Add new comment and post extraction features.
  6. Refine the GUI and settings to work with the new system and features

I am also a little too busy to get too deep into the migration right now, but I can get the initial scheme put together with a class to create the database. Then we can start working on creating DAO's to build the RedditObjects and move forward from there. I am usually a solo operation, so I am open to different strategies or input on everything.

MalloyDelacroix commented 5 years ago

I finally had time to get the initial database structure lined out. I've created a new branch, v3.0, and a class that creates the database. I've also made a database creator tool in the top level directory that uses the new class to create the database from the command line.

MalloyDelacroix commented 5 years ago

After starting with the initial path earlier this year, I believe I have had a change of heart regarding this issue. I am now firmly in the ORM camp. With the number of changes that will be required to make this work, I believe using an ORM will definitely be our best path to success. I think that SQL Alchemy is the right choice for the job.

I still currently have too much on my plate to make this massive update yet, so I am leaving this comment as more of a roadmap moving forward before anyone starts heading down the old path.