Bionus / imgbrd-grabber

Very customizable imageboard/booru downloader with powerful filenaming features.
https://www.bionus.org/imgbrd-grabber/
Apache License 2.0
2.55k stars 219 forks source link

Use a SQLite database for storing MD5s #2116

Open Bionus opened 4 years ago

Bionus commented 4 years ago

Suggestion

Currently, MD5 are stored in a TXT file, that is fully loaded in RAM on startup.

This causes issues such as #2093 where RAM usage can become crazy when the MD5 database starts getting bigger. It can also slow start-up times if the user is not using an SSD.

The suggestion is to switch to an SQLite database that would not be loaded, and query it when necessary.

✔️ Pros

Cons

Alternatives considered

More efficient data structures

Currently, the MD5 database is loaded in RAM in UTF-16n, and the data structure used is far from being optimized for memory. It could be possible to save some RAM by optimizing it. However, it will never reach the level of not loading anything in RAM.

Not loading the whole database in RAM

If the user changes the folder he wants to save to, it's not necessary to check the MD5 database of other directories. We could only load the list of images saved in that directory. However, it's currently impossible to check what folder was used for saving images in the TXT database format.

Also, that requires re-loading lots of stuff at runtime when the target directory changes. Even if it's not often, it might cause issues and easily lead to bugs because of the added complexity.

Steps

Beta

In the next version, this feature will be included as a beta. TXT will still be the default format: if there is neither TXT or SQLite files, a TXT one will be created.

However, the presence of a md5s.sqlite file in the settings directory will cause Grabber to use it instead of md5s.txt. A new "tool" will allow to migrate from TXT to SQLite format for users that want to try it out. This tool will not delete the TXT file.

Release

If all goes well, in the version after the beta, SQLite will become the new default format for new users: if there is neither TXT or SQLite files, a SQLite one will be created.

Deprecation

In the version after the "release" step, only SQLite files will be opened. On start-up, if a TXT file is found without an associated SQLite file, it won't be loaded, and a pop-up will appear on startup strongly suggesting the user to migrate its database using the tool.

A flag will also be set to ensure the warning is repeated even if a SQLite is created by saving some random image after this. This flag will be cleared by either deleting the TXT file or using the migration tool.

The migration will not be forced in case of issues to allow some users to still use the TXT database for a while.

Clean-up

In a future version, all code related to loading, warning, parsing, etc. MD5 databases in the TXT format will be removed.


Feedback / alternative suggestions welcome 😄

brazenvoid commented 4 years ago

Is the this available now in sqlite for use (with the tool)?

Bionus commented 4 years ago

The database is available (i.e. if there's a md5s.sqlite file it will be used). However the migration tool wasn't created yet. It's the next thing I'll be working on, though.

Also, I'm not even sure it's available in nightly since it's been broken for a while: https://ci.appveyor.com/project/Bionus/imgbrd-grabber

So I'll have to fix that first 😅

brazenvoid commented 3 years ago

The db has worked wonders for RAM usage, it went down from ~350MB with my 80MB MD5 file to ~70MB with ~115MB DB.