LibrePhotos / librephotos

A self-hosted open source photo management service. This is the repository of the backend.
MIT License
6.95k stars 303 forks source link

Separate photo addition from parsing #313

Open trowj opened 3 years ago

trowj commented 3 years ago

Describe the enhancement you'd like Separate the process of finding new photos and adding them to the library from processing them.

Describe why this will benefit the LibrePhotos In working on my own (abandoned) photo app a while ago I found that doing processing in multiple passes increased usability, especially with large photo sets. The first pass just loops over files and checks if they exist or not in the library (hash compare, paths, timestamps, whatever) and adds them if not. A second process then finds all unparsed photos and works on them (face recognition, tagging, locations etc). Doing this split meant that images were added faster even if they weren't fully parsed they were at least visible.

Additional context This also lends itself to separation of duties, which logically helps other areas. Re-parsing an image (on demand, or a new parsing process is created, etc) doesn't duplicate code or have to call tree walking code.

Additionally, for libraries like mine of not even 20k photos, I rescan nightly after syncing Google photos. That process takes nearly 16 hours to finish! This would let the photos show up faster, then have the metadata added on when available (perhaps immediately if you have multiple heavy workers)

akshay9 commented 3 years ago

I really like your idea of separating the flows. The current implementation of Photo scanning is done using Python's multiprocessing, which also adds latency due to mutex locking in pytorch models and (maybe) also in database.

I recently added the feature of Semantic Search to LP. On first run, when embeddings are calculated, they are calculated using batch size of 64. I found it to be 20x faster than iterating and processing each of 64. But for rescan of image library it still uses Batch size of 1. Most of the Machine Learning model inference like face recognition, places365 will hugely benefit from using Batches.

derneuere commented 3 years ago

Like I said in #151 I like the idea! I think the adding of photos to albums could also be improved by working in batches.

trowj commented 3 years ago

Like I said in #151 I like the idea! I think the adding of photos to albums could also be improved by working in batches.

My bad, I didn't see that when I quickly perused open issues! Feel free to close this one if you think it's too duplicative.