eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.65k stars 127 forks source link

Support for other common formats #2798

Open KiARC opened 1 month ago

KiARC commented 1 month ago

I recently migrated to Docspell from paperless-ngx and it's been pretty great so far. Unfortunately I'm running into one issue, which is that while Docspell can store my Powerpoint presentations, it can't index or display them. Paperless solved this using Apache Tika as an optional extension so to speak, and this seems like something Docspell could do as well, especially since it is already designed to work with external services (Solr). I would love Tika (and by extension, more supported formats) to be integrated with Docspell, either as a core feature or an addon (and if such an addon exists I would appreciate a pointer to it - the addon system is a bit opaque to me).

KiARC commented 1 month ago

A quick search shows that Tika is already partially used in Docspell, which is great. I saw a note by a maintainer who mentioned that the full Tika package is too big to be bundled with Docspell, but maybe a config option could be added to use an external Tika instance instead of the internal stripped-down one.

eikek commented 1 month ago

Hi @KiARC, powerpoint is unfortunately not in the supported file formats for Docspell. While docspell can work with solr for fulltext search, adding another external service will still increase complexity a lot. I think there are two options for me: 1) it could be done as an addon outside of Docspell that is maintained separately. 2) Since docspell includes the poi library, there is a good chance it supports at least some powerpoint "diallects". Then it could be done directly in docspell and not as an addon.

Both variants are not likely to happen soon, though, unless someone who is not me :-) is giving it a try.

KiARC commented 4 weeks ago

I'm surprised that reporting a malicious comment like the one by @/RJS32 above isn't as easy as "This comment is malicious". For those who don't trust that link, good on you, it seems to be a phishing page that tries (badly) to trick users into installing what is presumably malware.