dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
290 stars 136 forks source link

dCache shouldn't put pool data/metadata files in flat directories #6354

Open calestyo opened 2 years ago

calestyo commented 2 years ago

Hi.

I'd like to suggest that dCache doesn't put pool data/metadata files in flat directories.

Instead it should use something like e.g. git does for their object files, where pairs of leading hex digits are made as directories and then the respective files put in these, like e.g. so:

├── 9d
│   └── d4c70a87d0fb0bb2eae67e1e871c8a29ff616a
├── 9e
│   ├── 27db5cf4a2c6df28f3ac816a1a6e73d60b1a75
│   ├── acbe189858793625222f25e28cc8d6f09f3ac9
│   └── ed7accadb6e68449cf27383fa5ed36ba51a418
├── 9f
│   └── cca1f2a25fb5072e06ca0356cc7ebc91919714
├── a0
│   ├── 0760e3bf61c27f741f80a35cffcd10fc170681
│   ├── 3275f71e3a21765802576776c87d31bef77615
│   ├── 633329ac99d981ce3e3a5b7a1dcc422393b3c9
│   ├── d0ee4966652e0e0d357b8f6f2c9aee873c4207
│   ├── ec9c2b6c22d20a105f823f990837ec3d5bf1a2
│   └── f9015c65f105a2a5f4129676ea24ad1f894af5

One could either strip the part in the dir(s) from the filenames, or simply repeat it. In fact I'd even vote for the later.

Since dCache pools may have really many files, I'd go further than git does and allow multiple levels of directories, which is one further reason for my above "vote".

Also, dCache should provide means (like a command) to re-organise that structure... and ideally there should be some option that allows saying e.g. 30k files per leaf dir... and once that is reached, dCache should add another layer for that part of the tree. Maybe even the other way round.

I'd guess that any such re-organisation could happen while these files are used (rw).

The reason for this proposal is simply that most filesystems have their issues with very large directories. In specific I noted that btrfs seems really really bad in it... listing a /data/ dir with only 30k files takes over a minute.

However, I haven't seen any real negative impact so far on dCache performance... so the whole proposal is not super important, but perhaps something to keep on the radar.

Cheers, Chris.

kofemann commented 2 years ago

Hi @calestyo ,

this is quite simple to implement (and I still have the patch). The main issue is the backward compatibility. At that time I did quite some testing and couldn't get any measurable performance differences. I will try to find the branch, rebase it and build a package.