eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.56k stars 117 forks source link

[Feature request] Add "archive/hide" flag to hide documents from search #1404

Open arittner opened 2 years ago

arittner commented 2 years ago

Hi again, sorry for writing issues like on steroids.

I'm importing a lot of documents from my NAS and paperless-ng to docspell. Many documents are rather old and only archived, because I'm a digital native "messy" :-D. Anyway, I guess many users will have collected a lot of documents over time. But many documents become less relevant over time, without you then want to delete them. You could argue that documents will eventually disappear from the search list anyway due to the basic sorting, but if you search for certain document types or classifications, old documents will reappear.

I would be happy if you add an archive flag (bool) for this. I know you can also do this with Custom Fields, but they would be missing a feature that would still be important: An "archive flag" should work like the Trash flag in searches. If "archive" is not selected, we get only the "young" documents (not flagged). When the Archive checkbox is selected, the archived documents should be searchable.

Example in search: grafik

^- as alternative: "Include archived items in search" - in this case the search will run over all items in the DB (excluding Trash)

Example in edit:

grafik

^- proposal, maybe a "Folder" icon, and toggle state between "Archive / Un-Archive"

Why this workflow may make sense?

Many documents have different dates when they are no longer relevant for daily use. Receipts are relevant for 6 months if you want to keep track of the legal warranty. Tax documents are to be kept for 10 to 15 years, but I would be interested in finding them in my search now for 1-2 years at most. Terminated contracts can then be archived immediately.

I don't always want to see all of these documents together in my full text search. But that happens because they simply have different time relevance (from a few weeks to several years).

A Custom Field would not be considered automatically (unlike Inbox or Trash) in the searches. In addition, the Custom Filed must first be added and then selected in each document, and you would always have to create bookmark queries for it.

I know that is a customization to the DB schema and always requires special consideration and thinking if the effort is worth it, but personally I think that just such a small field can clearly help to keep overview in the document clutter.

eikek commented 2 years ago

I have to admit I'm rather reluctant on that feature :). Mainly because it can be achieved (at least very close) by existing functionality.

What I could imagine to add is something like a "default bookmark". That could act like a user or collective setting and would be selected by default in the search view. Then you can always opt-out by de-selecting it in the menu (I might miss some details :-) just thinking for the moment). This would make using a tag more convenient, with having a default bookmark that excludes all with an "archive" tag automatically.

If it is really an "archive" that you don't want to see most of the time, I would definitely recommend a folder. A folder is better supported when doing fulltext search.

Wdyt?

arittner commented 2 years ago

I'll try to explain my motivation for coming up with the feature request:

I had actually thought of a few things to implement with existing resources. Somehow I kept coming back to the "Flag Archived" because there are simply already two functions that work exactly the same way in principle: Inbox and Trash. And since neither inbox nor trash are implemented as a tag or folder (which could have been done using the same arguments you described), I thought the "Archive Flag" fit better into the overall concept.

But I had something else in mind: a synthetic "Archive" flag (under the control of the Docspell system with an exactly specific functionality) additionally allows controlling how the documents are then stored. In the future, it would be possible to have a second Lucene index for archived items. That would be for weak systems, possibly also a relief of the search. But I admit, this brings out the architect in me, because I have a lot to do with ElasticSearch and OpenSearch and larger amounts of data :-D

By the way, I also thought that it would be useful to have a "default" search criterion that is always applied (unless you manually control it) to achieve exactly the effect I have in mind.

However, I had discarded this (for the above reasons).

Still, if going that way, what would it take to get it to work?

Optionally, it would be great to have something like action macros that add some values to an item when you run them:

Examples:

To make it simple, it could just be a JSON that describes fields and values to update. If a field is not mentioned, it remains as it is, if you use null (Nil) the field is deleted, otherwise the value is entered. Only Tags and Custom Fields are a little bit special:

And if someone manages archives as custom fields:

(additionally, "remove-tags "and "update-tags" are needed)

^- ok, I admit this "Action macros" expands the feature request quite a bit, but just imagine what cool workflows a user can create.

This is all brainstorming right now, thinking about how to make this comfortable.

Your idea of establishing default bookmarks and then possibly having "action macros" (which appear as buttons to the left of [Confirm], for example) would probably be an outstanding feature in this combination, without adding further synthetic features. Also, you catch a lot of new ideas (like the archiving feature) and can point to the fact that the user can easily "build in" that themselves with two configurations.

grafik

Funnily enough, I am now more convinced and excited about this idea instead of establishing an "Archive Flag". However, it does look like more work. But this work can be used for dozens of workflows and not only for archiving documents.

eikek commented 2 years ago

And since neither inbox nor trash are implemented as a tag or folder (which could have been done using the same arguments you described), I thought the "Archive Flag" fit better into the overall concept.

Yes, you are very right. The inbox flag is from the very beginning - I wouldn't add it like this again. But since it's there I don't care to remove it as well :) The trash thing has a meaning to the server/application, so it's not that straight forward to replace with a tag - well, it is ofc, the user could configure which tag means "trashed" etc. But to me this is ok as it is.

But I had something else in mind: a synthetic "Archive" flag (under the control of the Docspell system with an exactly specific functionality) additionally allows controlling how the documents are then stored. In the future, it would be possible to have a second Lucene index for archived items. That would be for weak systems, possibly also a relief of the search. But I admit, this brings out the architect in me, because I have a lot to do with ElasticSearch and OpenSearch and larger amounts of data :-D

Ah, right - that makes totally sense! Hm, now I might change my opinion on this archive flag… :) I mean, that would be really nice to split storage based on these things and make searching faster. Especially with #1379 it would be possible to store archived files on different storage. OTOH, not sure how many people really have that much data right now. (just to weigh this against other things in the list)

Optionally, it would be great to have something like action macros that add some values to an item when you run them:

Examples:

* Action Item: "Archive": Update of item: Add Tag "archive", remove "Due date" value, Set Inbox=false, set Trash=false

* Action Item: "Invoice": Add Tag "invoice", add custom field: "amount

* Action Item "Handover to x": Set Folder to "Folder-of-X", Set Inbox = true

Nice! I like this, I think this is specifying the details for #1266.

Funnily enough, I am now more convinced and excited about this idea instead of establishing an "Archive Flag". However, it does look like more work. But this work can be used for dozens of workflows and not only for archiving documents.

👍🏼 Yes, probably more work - the ui is always lots of effort. But I really like the idea. And as you said, it can be used for many things, which is really nice.

So… i now like both ideas :) If I just had more time. I think we can add all to the backlog and see how it goes? It can be done in multiple steps. The default bookmark is nice to have for itself alone. Bookmarks are "first class", they can be retrieved by id from the server. I think it "only" needs client-side work - the list of default bookmarks can be hold in client settings (stored on the server as opaque json) or also the db could be extended. The action macros are clearly a bit more work, especially the ui for creating them. But then it should be just running a list of requests.