borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.88k stars 736 forks source link

reimplement files cache using hashindex code #5756

Open ThomasWaldmann opened 3 years ago

ThomasWaldmann commented 3 years ago

the files cache uses python dicts currently because cache entries also have variable length lists (the chunks list). this makes it easy to deal with, but also comes with some memory overhead.

@RonnyPfannschmidt had this idea for a more efficient replacement:

ThomasWaldmann commented 3 years ago

@RonnyPfannschmidt do you want to implement that? we could have a bounty for this.

RonnyPfannschmidt commented 3 years ago

Potentially, but not soon, currently I am rather thin stretched

ThomasWaldmann commented 3 years ago

Current implementation:

FileCacheEntry = namedtuple('FileCacheEntry', 'age inode size cmtime chunk_ids')
age: 0 .. max_ttl, int
inode: see st_ino, int 64bit?
size: see st_size, int 64bit
cmtime: see st_(c|m)time_ns, currently int 64bit or bigint bytestr, 64bit should be ok!?
chunk_ids: list of chunk hashes, 256bits each
path_hash = key.id_hash(filepath)
files[path_hash] = msgpack.packb(FileCacheEntry())

Notable:

ThomasWaldmann commented 3 years ago

New implementation idea:

FileCache hashindex entry:
age: 32bit (shared for in-band hashtable management)
inode: 64bit
size: 64bit
cmtime: 64bit
chunk_ids_start: 64bit
chunk_ids_count: 64bit
path_hash: 256bit (key)

chunk_ids array of chunk_id elements, 256bit each

Notable:

ThomasWaldmann commented 3 years ago

For people running into memory issues, I guess we can assume they have a lot of relatively small files, so there is only 1 chunk in the chunk_ids list.

So, do I estimate correctly, that we'ld usually save 65 - 16 == 49 bytes per file?

The new implementation would need 108b per file.

So coming from the current implementation, savings would be ~30%.

If i didn't miscalculate/estimate something, that makes it look like a lot of work for not so much savings.

Could still make sense in the context of #3868 where a fixed length record size might be useful.

ThomasWaldmann commented 3 years ago

@RonnyPfannschmidt can you check? ^^^

RonnyPfannschmidt commented 3 years ago

I'll give it a spin later in the evening, im currently working on details for an memory layout idea that may be of use to sort it differently

RonnyPfannschmidt commented 3 years ago

I have a basic idea for a structure which allows to use shareable memory mapped files and potentially 32 bit for most values, going by the assumption that in practice it is very unlikely to observe more than 2**30 chunk items there,

The idea needs to be tested, will try once the toddler sleeps

RonnyPfannschmidt commented 3 years ago

@ThomasWaldmann my initial experiment failed, and needs some better planning

i believe i'm able to provide a mmap backed variant of the hash-index in order to get that working correctly i need to experiment with understanding the involved data-structures and get a handle on the bucketing/reading/mapping

Gelma commented 3 years ago

Thanks a lot for your effort, guys.

Just for the record: in my case, with ~45.000.000 files, I have Borg using ~11GB of RAM. Also, the startup takes minutes (I guess is the Python unpickling object).

Also I have some swapin/swapon of the process (I guess the data are updated, with all the work of interpreter under the hood).

Anyway, I just kindly ask: using sqlite to store 'em? So, no matter the number of files, the RAM usage is always the same.

Thanks a lot again, Gelma

ThomasWaldmann commented 3 years ago

@Gelma borg accesses these hashtables a lot and my gut feeling about speed is: hashindex (RAM) > hashindex (mmap) >> sqlite (disk).

Especially for non-first backups of mostly unchanged files, the high speed of the in-memory hashtables makes borg very fast.

So the minutes of startup time and some RAM usage may be well invested when the alternative is hours or days longer runtime due to disk accesses.

ThomasWaldmann commented 3 years ago

@RonnyPfannschmidt maybe we can split the "save RAM" (by using less RAM), "mmap compatible" (layout) and "actually use mmap" (only have in RAM what we access) aspects. The first two are likely generally useful, the last only maybe.

Attic author tried to use mmaped hashindex long ago and it was unsuccessful (too slow). But, back then, there was no fadvise DONTNEED and similar stuff in the code.

RonnyPfannschmidt commented 3 years ago

The data structure im proposing will potentially archive both (everything should be able to work without mmap)

I'll have to demo it