eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.65k stars 127 forks source link

Automatic re-naming of attachments #992

Open floli opened 3 years ago

floli commented 3 years ago

Hello,

in contrast to e.g. paperspell-ng, docspell has two names for a document: the document name (title) and a name per attachment (file name). As docspell displays the file name quiet prominently, I feel an urge to change it to something meaningful beyond scan_08082021.pdf. However, in allmost all cases, the new file name does not carry any information that is not already included in the title and other meta data. This leads to the question of automatic renaming. As far as I see, it is not possible to automatically rename the attached files following some pattern from meta data?

I think, such a function would be very beneficial. Not only for the mere reason of displaying the name, but also when downloading and sending a file.

How can this be achieved? My ideas:

I like the third way most (automatic whenever meta data changes) most.

https://github.com/eikek/docspell/issues/543 is somehow related, but it is about automatic title generation, if I understood it correctly.

What do you think?

Best Regards!

eikek commented 3 years ago

Hi @floli - thanks for this proposal! I also think it's time to put some thought into this. I would also either prefer your third option or maybe even this option I have in mind: to rename it "on the fly". I've often used this strategy because of the up-to-date problem you described. In this option the name is generated, when the download happens or when the client asks for the name. Of course, this has also downsides. First, it requires additional database calls when downloading a file. Then it requires to have another step before giving metadata to the caller. While the latter is not expensive I think, it is possible to miss a spot (but this is also possible when reacting on change). I'm currently more in favor for the on-the-fly way, because it is less changes I think. The download of a file should not happen very often, so I would pay for the extra db call in this case.

Regarding usage: I think we need a pattern in the collective settings. The only issue is now that the metadata applies to the complete item, so we need to have some way to mean first, second, third attachment etc in the pattern.

Another option would be to think about whether this is a server task at all, or whether it can be moved to the client. But I think this option is not good, either. It would only work when displaying and not when downloading. If we want consistent behaviour across all clients, the server must be in charge.

You're right about #543 - the idea in this issue is to provide suggestions for a title for the item, not each file.

floli commented 3 years ago

Thanks for considering it!

Due to lack of knowledge, I may not fully understand your ideas on that.

to rename it "on the fly". I've often used this strategy because of the up-to-date problem you described. In this option the name is generated, when the download happens or when the client asks for the name.

Afaik, the files are saved directly in the database, not in the file system. So they don't have a real file name, but probably there is a column file name in the same table. Correct?

Therefore, there isn't a re-naming of files, only updating a single field, whenever meta data changes OR the the value of the file name is required somewhere (your on-the-fly proposal).

I'm currently more in favor for the on-the-fly way, because it is less changes I think. The download of a file should not happen very often, so I would pay for the extra db call in this case.

The file name is also displayed on the web-ui, meaning you have to do that extra db call every time when displaying. Also, I think it's good to not force automatic naming on the user, but enable/disable it on a per document basis with a default for newly created documents. That means, you won't get rid of that file name column in the database, just that it will become meaningless if automatic renaming is enabled, because the name is generated on-the-fly. Sounds not too nice, imho.

The only issue is now that the metadata applies to the complete item, so we need to have some way to mean first, second, third attachment etc in the pattern.

I think hard-coding to append _1 to the file name if there is more than one attachment to a document is sufficient.

eikek commented 3 years ago

Sorry for the late reply! :/

The files are saved in the db, that is correct. This allows for some different ways to solve this. The user doesn't care about the filename in the database - only the filename that is communicated to the client. We can update the database field, we can also apply the function when the file is downloaded/accessed. Has both pros and cons, of course! The pro for the "on the fly" way is, that it is not destructive. If the user disables this "auto naming", the old filename is preserved. And we don't need to act on changes (which is a bit difficult imho, need to take care for event bursts and things like that). It makes it also possible to be a user setting and not a global (collective wide) setting. The big pro for the db update case is that we don't need to do extra work when downloading the file or (more importantly) when returning the search results. We have the search results already in memory, the server needs to go through each one again to apply the auto-name function. It's not that bad, but involves one more iterating through the list….