eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.66k stars 127 forks source link

Re-triggering the processing #206

Closed eresturo closed 4 years ago

eresturo commented 4 years ago

Docspell grows in functionality in frequent release cycles, which I really enjoy. With 0.9 the feature was introduced to get the text extracted by OCR as an overlay into the PDF, which works great. Unfortunately, I already have some documents that were processed in an older version. Is it possible to trigger a new processing? An API call would be sufficient for me, a UI entry would not be necessary.

eikek commented 4 years ago

Yeah, I also thought about this, because I have the same "problem" :). I guess it would be better to only do this one thing: convert existing older pdfs. I'll play with this, I don't think it is too hard to do and actually it would belong to the feature itself to take care about old data, but this time I was too lazy :-). I'm not certain about any timeline here.

A (very) convoluted workaround for now would be to go to the item, upload the same file again and then remove the other one. (This is only for the times when you really want it …)

eresturo commented 4 years ago

Don't call yourself lazy, with all the work that's already gone into this :)

Thanks for the hint with the workaround.

I think such migration efforts will occur even more often, maybe one could introduce an upgrade procedure in general, which performs necessary migrations of existing data.

eikek commented 4 years ago

Yes, this is actually in place: there are sql migration scripts (and fulltext index migration tasks) for plain data migration between releases. But this one needs to run through external processes and so requires some coding. All the details exist already, it must "only" be composed together into a task which can then be dropped into the job executor.