Mozzo1000 / booklogr

A simple, self-hosted service to keep track of your personal library 📚
https://demo.booklogr.app
Apache License 2.0
155 stars 1 forks source link

Improvements slow OpenLibrary API calls #8

Open Mozzo1000 opened 1 month ago

Mozzo1000 commented 1 month ago

Background

The frontend directly calls the OpenLibrary API to fetch books information (title, description, isbn, cover image, etc) and is fully responsible for powering the frontend search in which we can search books by title or isbn.

Our backend does not talk to OpenLibrary, all is handled by the frontend that then sends the information to the backend for saving books in the users library. The backend only saves the most necessary information, so we do not for example save cover images.

The problem

The OpenLibrary API is slow and this hinders us from fetching the information in a quick enough manner that does not impact user experience. Searching is exceptionally slow and even getting a cover image can take long enough for it to timeout and in the end not get any image at all. This is okay for now, it works and as it is a free service the performance is nothing to complain about.

Proposed solution

OpenLibrary is open data and does provide us with data dumps that can be downloaded and handled locally. There has been previous work done to take the data dumps and importing it into a database that can be more easily searchable. Doing something similar to this as well as serving all the image covers should give us more control and shorten the distance to the data. We need to be able to do free-text search for titles as well as search by ISBN (we have thus far only focused on ISBN-13). And be able to retrieve different cover image sizes by ISBN.

Some disadvantages to this solution is that OpenLibrary has millions of books in there database and the dumps are around 40 GB in size (excluding all cover images). We will have to develop a pipeline for creating the local database based on data from new dumps that gets released regularly, and ingestion will probably take a long time.

More research on this topic need to be done.

Mozzo1000 commented 1 month ago

I spent some time looking into this and fixing the search experience was surprisingly easy. Work is being done in a separate repo as we don't have to create something that can only be used with this project. We might also want the user to select which search API to use when installing the service.. The cover images are an entirely different beast to handle, it looks to be around 1.2TB of images and that is a lot more than what a regular person wants to handle for this kind of service.

https://github.com/mozzo1000/openlibrary-local-db

https://fosstodon.org/@mozzo/112808589345794906

This issue will remain open even though most work is being done in a separate repo at the moment.