eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.61k stars 120 forks source link

Create ASN (archive serial number) as autoincrementing Metadata #924

Open mirisbowring opened 3 years ago

mirisbowring commented 3 years ago

According to the docs, one should manually add a ASN via the custom Fields function in the UI. This has some "problems":

  1. I need to know at which ASN i stopped last time when adding a new one to an additional document
  2. I cannot easily search and order by ASN
  3. Currently there are much more steps necessary to achieve such elementar result (which results bad user experience).

Coming from Paperless-NG, one has a metadata field for the ASN which is empty by default. If i decide to archive my scanned document, i just press the +1 Button and the Backend selects the next ASN-ID and inserts this into the field (then i just press save after noting the number on the document).

The ASN should be an Index / from an indexed Table. For example: If i create an ASN (e.x. 12), save the document and exit and delete the ASN later because i dropped the physical copy (i don't need it anymore), the next number generated should be 13 not 12 (even if 12 does not exist) because it is "reserved" by the dropped hardcopy.

I hope my point is understandable :)

eikek commented 3 years ago

I think the feature is a little different than what I was aiming at. The ASN in my case exists before the document is added to the application; i.e. it is printed on the paper. So the source of this id is outside of the application. Then it is natural to simply copy&paste it into a custom field (at some point, I hope to be able to detect it).

If you just need an ID from the application, why must it be serial? Each item already has an ID. You could note the first 6 or 8 characters on your document and then search it via a query like id:6hMuR2*.

eikek commented 3 years ago

Ok … just realized that it must be serial to make any sense at all, sorry! Was a long day today ;-) I'm not sure if I want to add this though (currently); need to think more about it. An alternative would be to buy a stamp that auto-increments (they are somewhat expensive, but work great).

mirisbowring commented 3 years ago

For me, this would be an important feature. Most of my invoices and documents can be destroyed because I only need them digitally for my files. But some invoices, like the ones from an electronic store, where I bought a Fridge/TV/Phone, etc. must be kept physically to exchange the device in a warranty case.

For such document, I would generate this ID, note it on the invoice and keep the invoice in an physical folder for example.

A alphanumeric ID like you mentioned would not fit my case. Imagine you have 400 physical documents that are identified via a random ID. Know, you search the docspell for the required document. See ID 6hMuR2. Now, you must search the physical copy between 399 other random IDs.

If the ID was e.x. 237, I could open the Folder (knowing it must be somewhere in the middle) and directly scrolling to the desired document.

Handling this ID manually would increase complexity and reduce user experience.

Technically you would need a new id_seq_asn postgrestable which would be referenced by ID as unique into the "Document" table (next to filename, etc.). When pressing the "Add and increment" button, the Application should increment the ID and write it to the desired row. Besides this, deleting the ID -> clearing the Textfield and pressing save must be there to, in case that a physical document was purged.

Sadly I've never worked with Scala or Elm and therefore cannot create a quick PR.

Probably I'll have some spare time in the next weeks to try to make such PR.

eikek commented 3 years ago

Yeah, I see. I realized it after writing the first comment that a serial number is needed; added this insight as a second comment :-). My brain didn't serve me too well …. The alternative using a stamp like this one is istill interesting. It can put incrementing numbers on your paper. I do it this way, then I scan the document. It is very similar, but involves another physical device. User experience (at least for me) is very nice!

It is surely not a difficult feature to add, but it is also not very easy. I don't want to add another mandatory field, I'd rather solve it via a custom field I think. Then docspell supports h2, postgres and mariadb, so all these must be accounted for (and sequences should be per collective probably). I'm not saying I'm completely against, but won't happen soon probably. If it is added, I want to think about how to properly integrate it into the current app.

mirisbowring commented 3 years ago

Than I would have the next device laying around after i stamped all the documents :/

This field should not be mandatory. Many documents don't need a ASN if not kept physically. With this ASN one could also track how many physical documents do exist.

I understand the postgres, H2, mariaDB problem. Probably that could be a collective option. I don't know the exact DB-Design.

But if you have a table where the Collectives are held, you could add a column "ASN". This would be numeric and starting by 0.

All users within this collective would share the same index (the one from the collective). In this case the application should make atomic operations. e.X. User A of the collective indexes a common document. The Backend would update ASN set value (select current value + 1) where collective = collective (pseudo code). In this case, this would be a atomic operation due to the table lock. After, the updated value get's returned to the client / inserted into the document row. When User B indexes a private document (in a private folder), he would get the ASN 2. Even if this is the only document he ever indexes.

A User-ASN would not work if B e.x. makes his indexed document common and there is already a Document with ASN 2.

eikek commented 3 years ago

Than I would have the next device laying around after i stamped all the documents :/

I keep having documents to store physically. I usually store contracts or things with a bit more money involved. In your use case, do you have a fixed number of such documents and once they are processed, no more are arriving?

To me the process with the stamp is much better. When I deal with physical documents (especially many), I put the stamp on it, scan it and put it away; and proceed with the next. Then I'm done with the physical process. The "digital" one in the app can happen any time later. With the process you describe, I scan the document, then go to the app, wait for the document being processed to get a number to put on the physical document, and then I can put it away - there is a break between the "digital" and "analog" process. I would need to use my phone in between scans of multiple documents or head to the computer. Not ideal for me :-).

We could also think about having some sequence generator on the phone, then you can put the number on the paper before scanning without using a stamp.

This field should not be mandatory. Many documents don't need a ASN if not kept physically. With this ASN one could also track how many physical documents do exist.

This is possible already using custom fields. You can search via f:asn:* for example. It is not tight to automatically retrieve the next sequence number. Custom fields are a good fit I think, because they are not stored on the item itself.

I understand the postgres, H2, mariaDB problem. Probably that could be a collective option. I don't know the exact DB-Design.

But if you have a table where the Collectives are held, you could add a column "ASN". This would be numeric and starting by 0.

Right now I think I would rather use a separate table. A sequence is not required, so I think it's better to not have many null or unused values, but rather use a separate table to store these sequences for collectives that need them (should one have many collectives). It makes also rollbacks to a previous version possible.

All users within this collective would share the same index (the one from the collective). In this case the application should make atomic operations. e.X. User A of the collective indexes a common document. The Backend would update ASN set value (select current value + 1) where collective = collective (pseudo code). In this case, this would be a atomic operation due to the table lock. After, the updated value get's returned to the client / inserted into the document row. When User B indexes a private document (in a private folder), he would get the ASN 2. Even if this is the only document he ever indexes.

A User-ASN would not work if B e.x. makes his indexed document common and there is already a Document with ASN 2.

Of course, this must be an atomic operation, but I think this is not the problem. All dbms I know have sequences and if not we can use transactions. To me, the difficulty is more how to properly integrate it into the application, UI/UX/Api wise. For example, we could use a sequence per custom field (have a field type "sequence") instead of one single sequence per collective (a user based sequence won't work, of course, for the reasons you described). Just a thought, though, need more time :) Only to show there are many ways and I'm not sure which one to go.

mirisbowring commented 2 years ago

hi again :) Docspell has evolved great. Since Paperless-NG is lacking the user management, it is not suitable for me anymore and i am continuing to migrate to docspell.

You mentioned a "manual stamp":

The alternative using a stamp like this one is istill interesting. It can put incrementing numbers on your paper. I do it this way, then I scan the document. It is very similar, but involves another physical device. User experience (at least for me) is very nice!

Has Docspell the ability to autodetect the stamped number and add it to a custom field?

Or alternatively - are you still considering to add a "indexed" custom field?

Regards!

eikek commented 2 years ago

Hi!

No, unfortunately docspell can't detect the stamp yet :/. This is something I really like to have myself :). At least, it is recognized as text and I can quickly select it by a double-click in the pdf. I don't have yet decided how to implement it; maybe via external plugins - something on my list from the beginning … maybe the last "big" feature ;-).

It's a bit similar to the auto-index field. I would consider it, because I think it can fit well into the custom-fields, but it is not high on the list. Glad to have an open issue for this, just be aware that I have no idea when this is being worked on.

madduck commented 1 year ago

For completeness, here is the Paperless-NGX workflow which is indeed quite beautiful.

It's important to distinguish between ASNs and other type of numbers that might pre-exist on the document. Those aren't "serial numbers", and I would kindly ask to refrain from letting them take over the ASN concept — custom metadata can hold those.

But there are also two types of ASNs: external (e.g. the auto-incrementing stamp), or internal, and I think it should be a property of a collective which one is used. The two cannot really be combined.

Here is what the UI could look like for external ASNs:

image

Bonus points for background API calls to check uniqueness, and have the UI blink in orange and pink while the number isn't unique.

With internal ASNs, it could be just this:

image

And when the button is clicked, it gets replaced with the generated ASN that cannot be edited any longer:

image

In the backend, maybe this is a viable approach?

Even though all the three databases support auto-incrementing serial numbers, using those would mean that the serial numbers would be globally unique, but I think they should be successive within a collective. I don't want to have my family documents be numbered 13, 57, 312 just because a lot of documents were filed in other collectives in between. Documents should be numbered incrementally so that when 13 and 15 are next to each other, you can and should wonder where 14 might be.

  1. Therefore, using the idea of @mirisbowring above, create a new table that keeps track of the highest serial per collective.

  2. Add a DEFAULT NULL asn column to the documents table, with a UNIQUE constraint across (collid, asn).

  3. Provide an API endpoint assign_asn, which takes a document ID as input, and uses transactional database integrity to

    1. Generate a new number using the table suggested by @mirisbowring;
    2. Update the documents table to set the asn field accordingly;
    3. Returns the new ASN as result of the API call, which the UI can use to update itself.

What do you think?

eikek commented 1 year ago

I'm still hesitant to be honest. The technical side about generating sequence numbers never was a problem. But I don't want to add another metadata concept to docspell. This asn is not more interesting than any other metadata. And there is already so much stuff. I think, if that is to be added, then on top of something that already exists. Custom fields for example. Perhaps, as written already above, a field type "sequence" or similar could exist and it will be auto-populated with a generated sequence number (distinct sequences per field name). Hm, not sure yet :-)

madduck commented 1 year ago

I agree that there's already a lot of stuff, but we'll have to disagree on whether the ASN is more or less interesting than e.g. "concerning". Physical document storage is required by law in certain cases, and ASNs are a quintessential link between the digital and the real world.

eikek commented 1 year ago

Why not using a custom field for ASN (that's what I do).

madduck commented 1 year ago

@eikek two reasons:

  1. It's cumbersome to add an ASN field. Arguably, since not every document will get an ASN, this can be argued to be a benefit, but coming from Paperless, I'd rather see an empty ASN (which is also information) than not see the ASN field.
  2. There is no uniqueness (#542) and no easy way to get the internal next number (see above).

I can understand that you do not want to overload the UI any more. Therefore, my suggestion in #2272 is to make ASN assignment be a workflow in and of itself. If meanwhile we can figure out whether and how to enable uniqueness, and provide a counter type for custom metadata, we're probably creating code that we'll be using again later.

eikek commented 1 year ago

The importance of ASN (i.e. the "internal" variant) is subjective, imho. For example, I use "external ASN" and don't need the "internal" one at all :). I also use docspell for not only the household documents - there any ASN makes no sense. An internal counter is not very helpful to me. There is the id to identify an item, no need to create another one. There are lots of metadata, like date-created etc. I don't want to explain my partner to ignore that ASN field …

I think "external ASN" can be very well supported by a custom field. The external ASN source will be unique and the number will be on the scanned document/paper. There is no need to create some artificial constraint in my eyes. If there is no ASN field, there is no ASN - same information as an empty field but less ui noise ;-). In my case, for example, it would be 95% noise. You can also look at the document easily to see what number it has.

The "internal ASN" case I don't understand :-) it seems like a more difficult process in my eyes. But I'm fine with a new custom field type "sequence" that automatically increases its value (and an option to ensure uniqueness). People who are not using this, don't need to be bothered, while others should be able to create their complex metadata types with these two features (I hope).

madduck commented 1 year ago

The more I think about this, the more this is a workflow topic. A sequence field will be hard to get right without API support in a multiuser environment.