Maintain same filenames and folder structure on filesystem as in the database

rmatte commented 1 year ago

I tried using papermerge for a bit and I really like the interface. The one thing that was kind of offputting is that I noticed that the files on the filesystem don't get updated according to the changes that I make to them in the UI. For instance, I can rename a file in the UI but when I look at the version of that file that's stored on the filesystem the name stays the same as it was when I uploaded it. Also, when I created folders in papermerge and move documents in to them that is not reflected on the filesystem.

Why is this bad? Because if I were to lose the papermerge database due to corruption or something, even though I do still have the actual files on the filesystem, they aren't named and organized properly, so it would be a huge chore to then figure out what everything is and get going again.

It would be a significant improvement if you'd make it so that when you rename a file in the UI it also renames the files on disk and when you create a directory in the UI it creates a directory on disk and moves the files in to it. The files on disk should basically be a mirror image of what's in the UI in terms of filenames and folders/directories. Since the database does appear to have a link back to each individual file you could make it do this for any existing files which haven't been renamed or moved to a directory yet as well. If you need to keep multiple versions of files you could simply add a version number to the end of each version or maybe create a sub-folder representing each file and have each version stored in there.

If there's some sort of config option that can be enabled to allow for this let me know, but I read the docs and tried a lot of different stuff and I couldn't get it to behave this way.

I know that there's a way to issue a command to create backups which include a copy of the database and the files in it, but it's still less than ideal. It would be very nice to be able to just backup the document directories and files somewhere as well, but they need to be renamed and organized in directories on disk in a way that makes sense for that to really be feasible.

I don't see any advantage at all to storing the documents in their original filenames all in one directory like the software is currently doing. It's a mess to work with outside of the application. This also seems like it would be fairly simple to code. Code a migration step which runs once to re-organize any existing documents on disk based on the information in the database, then have hooks in place in the code when documents are renamed and moved to mirror those actions on disk.

Thanks.

qq7te commented 1 year ago

exactly! I thought I could come up with a patch to allow that, but unfortunately my free time is limited. I've tried, but didn't get far.

@rmatte do you have the time for a proof of concept?

rmatte commented 1 year ago

exactly! I thought I could come up with a patch to allow that, but unfortunately my free time is limited. I've tried, but didn't get far.

@rmatte do you have the time for a proof of concept?

I see that this is written in Python and that is the language that I code in almost daily, so yeah, if I can find some time to really dig in to this and come up with a patch I will. It'll take a fair bit of time though as I'll need to first read through the code and familiarize myself with it. I'll do what I can if I can find time to invest in learning all of this first, but this feels like something that would probably take the main developer a few hours to throw together since they already know where everything is vs probably like 15 to 20 combined hours of effort for me to familiarize myself with the code in depth and with the database structure first before actually coding the changes. I'm sure there are some nuances with the OCR, the file versioning and stuff which would need to be accounted for.

rmatte commented 1 year ago

exactly! I thought I could come up with a patch to allow that, but unfortunately my free time is limited. I've tried, but didn't get far.

@rmatte do you have the time for a proof of concept?

I've decided to just use my new Synology NAS for my document management and write some custom scripts to do automated OCR and file placement. My scanner software does OCR already as well. It's going to be way easier than trying to refactor this package. The NAS already has the ability to index the content of documents and then search through them so it'll be perfect for this. I really do hope that the developer implements this feature eventually though as it doesn't make any sense not to. Having documents organized neatly in to folders on disk with matching filenames to what's in papermerge just makes a whole lot of sense.

ciur / papermerge

Maintain same filenames and folder structure on filesystem as in the database #526