MohrJonas / obsidian-ocr

Obsidian OCR allows you to search for text in your images and pdfs
GNU General Public License v3.0
279 stars 5 forks source link

Design issue: OCR Producers vs Consumers #20

Closed evilgeniuschronicles closed 1 year ago

evilgeniuschronicles commented 1 year ago

This is a more abstract design consideration rather than a code issue. Right now this is very one-to-one tied into the notion that the Obsidian instance with the plugin enabled is the one scanning and consuming the indices. However there is a big use case for consuming already scanned indices.

For example - I have four Obsidian instances I use regularly, two different desktops and two mobile devices. All sync via a repository stored in Drobox. The mobile devices can't do the OCR and I really don't even want both desktops racing to scan when new graphics are added. However, I'd like to do the OCR search on all of them, whichever instance I am currently using. This implies a way to call one instance the scanner and all the others readers. The writer calls OCR provider and writes .ocr.json files and all others are passive consumers that build a local search index based on the .ocr.json files.

This implies a few things:

evilgeniuschronicles commented 1 year ago

The simplest solution is probably a toggle for "Scan with this Obsidian". However one has different config files on synced repositories is a problem for the user. Specifically avoiding syncing that file should be possible for any user.

The building of the TranscriptCache for new files could be moved from the scanning process to the callback on file creation. Whether it is local to the scanning instance or coming via sync the file creation should happen the same for every situation.

MohrJonas commented 1 year ago

Interesting idea. However, this is not really something I personally have a use for. But if you want to implement it yourself, I'd be more than happy to help you out.