LycheeOrg / Lychee

A great looking and easy-to-use photo-management-system you can run on your server, to manage and share photos.
https://lycheeorg.github.io/
MIT License
3.42k stars 303 forks source link

Better folder structure? #60

Closed alex-phillips closed 2 years ago

alex-phillips commented 5 years ago

Will lychee-laravel have a better folder structure for storing photos? i.e., not one single giant directory containing all files? This can become unweildy when accessing the pictures via the filesystem directly when you start to get thousands of files.

ildyria commented 5 years ago

It is not planned at the moment.

The problem with using a directory structure is that it makes it harder to have virtual copies. However I will try to make that if you download a folder, you are actually downloading pictures with a directory structure.

Also why would you need to access it directly via the filesystem ? Lychee does not aim to be synchronized with a file system etc.

hermzz commented 5 years ago

I think what he means is not having all thumbs in public/upload/thumbs, but rather something like public/upload/thumbs/8/a/0/1/8a013a3e3eb526e4c6971cc31b800c27.jpeg?

scratcher28 commented 5 years ago

+1. That the main reason I forked it, because my gallery is ~2 million pictures, I had to optimize it to handle such amount of stuff. However, no plans to share the code because it's very project-specific.

d7415 commented 5 years ago

something like public/upload/thumbs/8/a/0/1/8a013a3e3eb526e4c6971cc31b800c27.jpeg?

I have no objections to this, but I think direct filesystem access is still something of an "advanced" function given Lychee's file naming strategy. I assume the advantage is performance, but I'm curious how much of a benefit this would give and how we choose how many directory levels to implement.

I suppose it could be an option, disabled by default, but then there would have to be a function to move all existing files (which, given the target audience, may be a lot) whenever this is changed.

I'm open to ideas, but I'm yet to be sold on the concept.

hermzz commented 5 years ago

@scratcher28 What folder structure did you use?

@d7415 The advantage isn't really performance but rather making it easier to manage and debug.

The implementation is quite simple and involves adding a couple of lines to the process for saving the images to create the directory structure, plus the migration to enlarge the URL fields to accommodate the extra characters. The largest bit of code is probably writing a command to migrate from the flat structure to the tree structure.

I've calculated roughly how many images per folder you'd get if you had a total of 2,000,000 photos depending on how many characters deep you went:

d7415 commented 5 years ago

@d7415 The advantage isn't really performance but rather making it easier to manage and debug.

If anything I'd expect this to make it harder to manage and debug. Not by a lot, but I can't see how it would make it easier, and it definitely adds an extra level of complexity.

scratcher28 commented 5 years ago

@hermzz, Like I mentioned above, my fork is very project-specific, so I am using a tree structure and it's the same as on disk storage (at the moment of import). Currently maximum 3000 files per directory, but I'm now splitting my albums to 1000, because large albums hang the browser or make it very unresponsive. My small/thumb/medium directories have the same structure as "big" and important thing: I create small/thumb/medium versions only on demand (not on upload or import!), the thumbs are also processed with image optimizers + minor tweaks and cached. I am also working on better import mechanism because I have millions of photos.

ildyria commented 5 years ago

I'm now splitting my albums to 1000, because large albums hang the browser or make it very unresponsive.

I am aware of this and I am thinking about adding a pagination system (but how)

ildyria commented 5 years ago

@scratcher28 feel free to share/PR some of your optimizations. :)

lopezio commented 5 years ago

To add an opinion on this: for me it is very important that the folder structure that I have for the original pictures is retained. I think for many professional photographers this is also an important pre-requisite. For me the file system structure must always be a possible "last resort" and I want a database driven solution to be able to handle / reference any file structure added. I had a look at Piwigo lately but although the project deserves some sympathy, that code is a dead end. I'd rather give a try at forking Lychee-Laravel and if I succeed, why not submit some PRs. @scratcher28 do you have some pointers on the files / namespaces you worked on for your fork? Best Regards.

kamil4 commented 5 years ago

I expect that retaining the original directory structure would require no changes to the front end; the modifications would be limited to the server side (we already send complete (relative) paths to the front end as part of album listings, etc.).

Off the top of my head:

This thread contains a lot of interesting ideas, such as the lazy generation of intermediate sizes. I wish we got to see some of that code! :smiley:

d7415 commented 5 years ago

Retaining the original structure sounds like a good option. I'd probably limit it to "Import from Server" (for I hope obvious reasons!) either as a configuration option, another checkbox, or both (like delete_imported). I agree that Lychee handling intermediates seems the simplest option there, and also means that Lychee would only need read access to the source images, which may be desirable.

It's worth noting that the server would need to be configured to serve any directories used this way, which may be outside of its normal scope.

lopezio commented 5 years ago

I expect that retaining the original directory structure would require no changes to the front end; the modifications would be limited to the server side (we already send complete (relative) paths to the front end as part of album listings, etc.).

That's good news: I feel much better with backend code.. :-D

  • Where would you want to store the generated intermediate sizes...

Personally, out of the original's way (I consider them "negatives", including the original directory structure) - so the separate directory tree managed by Lychee thing sounds perfect to me.

Still it could be enhanced later... e.g. by offering options to store the generated derivatives somewhere "near" the originals, such as a "_generated" / "_derivatives" / "_rendered" / "_variants"(to brainstorm some possible names) subdirectories. This would make it easier to move from one Lychee to another with the same media sets (also to be able - should anything ever fail badly - to rebuild the database without having to re-render each variant).

  • Since this was last discussed, @ildyria ported much of the filesystem management to a more abstract Storage facade. I wonder if any of this could be implemented more easily now?

Is it related to this? app/Photo.php

This thread contains a lot of interesting ideas, such as the lazy generation of intermediate sizes. I wish we got to see some of that code! 😃

I actually have a lot of other must-haves in my personal list... exif/iptx syncing, extending navigation way beyond just "albums", (i.e. support for at the very least the notions/tags of "Event", "People", "Tags"); a huge openstreetmap navigation; broad raw format supoort; HEIC support, support for multiple original versions of the same file... to name a few ;) I hope I can use up some time to help out.. rather than completely reinvent the wheel...

helmut72 commented 4 years ago

Some thoughts:

  1. Even with a medium photo library, a simple ls command never works, because there are too many files in one folder.

  2. Lychee must be the one and only photo management tool how it works right now

  3. For 2) there are too many features left. There is even no simple timeline view like on mobile phones or photo and video rotation

  4. There is no WebDAV (just an example) available with a virtual album/folder based structure to export everything again, if someone want to work with other tools and/or move away from Lychee. Manually downloading Album by Album and sorting again into another product? Too much time and error prone.

d7415 commented 4 years ago
  1. Even with a medium photo library, a simple ls command never works, because there are too many files in one folder.

ls works fine for me with >6000 photos. I'd say that was comfortably into medium,if not large.

  1. Lychee must be the one and only photo management tool how it works right now

If you're referring to just the file structure, you're probably wrong, but I'm not convinced this is relevant.

  1. For 2) there are too many features left. There is even no simple timeline view like on mobile phones or photo and video rotation

Feel free to open a PR or use another gallery. A user is already working on a PR for image rotation (discussed in #471)

  1. There is no WebDAV (just an example) available with a virtual album/folder based structure to export everything again, if someone want to work with other tools and/or move away from Lychee. Manually downloading Album by Album and sorting again into another product? Too much time and error prone.

That would be an extremely niche feature and I would say well out of scope. Downloading by album is the current option (see below for a better option), though I'm sure we would be interested if you would like to open a PR to download all photos in a single, structured archive.

Edit to add: It's important to remember that this project is developed by volunteers - you haven't paid for it. Even with paid products, ranting does not incentivise people to help you.

ildyria commented 4 years ago
  1. There is no WebDAV (just an example) available with a virtual album/folder based structure to export everything again, if someone want to work with other tools and/or move away from Lychee. Manually downloading Album by Album and sorting again into another product? Too much time and error prone.
  1. Create an Album
  2. Move all the other albums in it
  3. Download said Album.
  4. Done.
helmut72 commented 4 years ago
  1. done

It's easier and more stable to scp/rsync 150GB from the file system than from a PHP process with a Webbrowser. Because the result is the same: all 30.000 videos and photos are completely unsorted. No offense, but nothing is done.

Please don't misunderstood my posting, I just want to told you my thoughts. There is never "the one and only" tool when it comes to media files. Interoperability is important. One tool offer one access type, other tool offer that access type, third tool offer another feature.

Example: my Music files are sorted into folders/sub-folders and can be accessed from a SMB share. There are a lot better tools for mass tagging Music files than a webbased tool can be ever offer. But Music will be played from the great Subsonic and also serve my Music to Apps for mobiles phones.

Anyway, thank you for clarifying.

ildyria commented 4 years ago

Because the result is the same: all 30.000 videos and photos are completely unsorted.

No, when you do a download, the files are renamed from their title on Lychee and the album structure shown is preserved.

But I see your point.

JacquelinDV commented 4 years ago

Bonjour, The download possibility can be used to make one physical structural backup of one or more albums after an update or planned. Is there a possibility to automate this function via shell command or cron? Merci

rburchell commented 3 years ago

@d7415 I hope you don't mind my long, rambling drive-by comment. I doubt I'll sell you any more on the idea, but I thought I might as well chime in, as I have some prior experience in exactly this field that may prove useful to you (although, not with Lychee: I had only just started tinkering with it a bit today).

For background, I have, over the past few years, built my own closed source version of exactly this sort of thing before: storing file content by hash, with a separate database mapping metadata, for a tool for large-scale media management, organisation, and querying. (And no, I am not seeking to replace this tool with Lychee - I have another reason for looking at Lychee)

This tool is presently archiving around 400G worth of content across approximately 2 million files. In this setup, each of the leaf directories, which are three layers deep (a/b/c/) looks to be under 100 meg or so at max (I don't have exact figures as I couldn't be bothered checking more than a couple), and contains a few hundred files.

I should note that this structure was not the one I initially used, though I knew I would eventually need it: I started out 'light' with everything in a single directory, but eventually switched when things started to not scale anymore as I kept adding content. Hashing was always in place, though, primarily to deduplicate importing the same content multiple times: a valuable feature when trying to import a large set of copies accumulated over the years.


I assume the advantage is performance, but I'm curious how much of a benefit this would give

It's hard to give a general answer, but let's go with "a lot". I can tell you that based on what I know about filesystems, but also from the experience I had in building the tool above: I'm pretty sure that things will break from my own prior experience in trying to scale out this far.

There are two failure modes to be aware of here. First, hard failure: many filesystems have a hard limit on the number of entries they allow in a directory. These limits aren't too low, but they are definitely reachable with a large collection (<100k files).

Secondly, there is subtle failure. Things may not break completely - they may just become increasingly sluggish until one day you're left wondering why you wait minutes for a simple shell command to complete. This soft failure will usually be reached far, far before the hard limits - and exactly how bad it is depends on which tool(s) you are using.

The reason for this "soft failure" is simple: searching a directory for an entry scales based on the size of that directory. How well it scales depends on the filesystem: for some filesystems, finding an entry involves a linear scan, so performance will fall off a cliff very quickly. Others have some sort of index structure to amortize the cost, but it will inevitably get slower the larger the directory is.

And beneath the filesystem itself, reading all of the large directory's content from the block device will also not perform very well, especially on spinning rust disks - SSDs will probably fare better, but it will still hurt overall.

and how we choose how many directory levels to implement.

I would say that you should pick the maximum size library you want sensibly try to support, and then figure out how to split that to keep the files-per-directory at, let's say, a soft limit of 1k. My earlier-mentioned library (2 million files) is split at 3 levels, which ends up giving me a few hundred files per directory, and this seems to scale perfectly well for a library of my size (touch wood).

I personally don't see the value in making it configurable - it is a niche setting that would only add complexity, and why do your 'end users' really care about the structure, when it's mechanically maintained? So perhaps even that setting doesn't really make sense - but I am not Lychee's maintainer.

d7415 commented 3 years ago

@rburchell Thanks for that. It was a good level of detail while staying on point.

My thoughts: