babluboy / bookworm

A simple ebook reader for Elementary OS
GNU General Public License v3.0
1.32k stars 101 forks source link

Cache Management Improvement #315

Open brainchild0 opened 4 years ago

brainchild0 commented 4 years ago

I recently discovered that an enormous amount of local storage in the user directory is being consumed by Bookworm artifacts, in the form of residual extracted files from EPUB documents previously opened. On my Linux system, they reside in ~/.local/share/com.github.blabluboy.bookworm/.

The size of these artifacts, several gigabytes, is enormous, even by modern standards of hardware availability, which combined with their remaining persistently, imposes a major usability obstacle for the application. I hope this observation might be addressed.

babluboy commented 4 years ago

@brainchild0 There is an option to disable cache in the settings, that will prevent bookworm in extracting the book contents for quicker reading later.

If cache is disabled then the book in the library is extracted every time it is opened for reading and if it has been moved from the location from where it was added then bookworm will require you to add it again.

If cache is turned on then bookworm does not refer to the ebook file in the original location - so the trade off is there for performance vs storage

brainchild0 commented 4 years ago

The cache option is currently disabled. I don't recall ever changing it, but my memory may fail. What is the default? Does the application design include cleaning the cache when it the option becomes disabled?

babluboy commented 4 years ago

The cache is turned to on by default. When cache is disabled the books directory is not emptied - maybe something I can look at doing.

You can remove one of the books inside the “books” directory in the cache folder and then try to open it within Bookworm library. Bookworm will not find the cache contents and will attempt to load the book from its original location when it was added to the library. If the book is present in the original location the contents will be extracted to /tmp and the book will be opened. The /tmp dir gets cleaned on restart by the system

Let me know how it goes. I can use this ticket for doing the following improvements: 1- when cache is disabled, remove the cache contents 2- automatically check all books in the library and remove them if they are not present in their original location when added to bookworm 3- show some notifications of this action to the user

brainchild0 commented 4 years ago

Now I understand what most likely happened in my case. I disabled the cache, but the artifacts remained.

Let me know how it goes.

I need the storage space, so I simply cleared the entire contents.

I can use this ticket for doing the following improvements: 1- when cache is disabled, remove the cache contents 2- automatically check all books in the library and remove them if they are not present in their original location when added to bookworm 3- show some notifications of this action to the user

Items (1) and (2) are certainly excellent proposed enhancements.

I would further suggest running periodic checks, including on application startup, to clean the cache, as includes both of these specific operations. Such behavior helps ensure lack of orphaned cache items, especially if a purge did not complete successfully since a time in the past when the cache may have been disabled.

To item (3), I would give a lower priority, because of the preference not to distract the user with information about routine administrative operations. If, however, the application is running some resource-intensive background operation, then some subtle form of user notification is helpful.

Running this operation asynchronously obviously ensures a better user experience.

Future improvements might include any of the following:

  1. Enforcing a configurable upper limit on storage use.
  2. Employing a proper cache-management strategy.
  3. Creating a method to avoid naming collisions, in case the same file name appears in multiple scanned directories.
  4. Providing the default as caching being disabled, or at least limited to some very modest size.

By the way, why use the full domain as the directory name? I have never seen such a convention in use, at least in a Linux environment, though Linux may be slower to adopt modern methods than commercial platforms.