garethgeorge / backrest

Backrest is a web UI and orchestrator for restic backup.
GNU General Public License v3.0
928 stars 33 forks source link

reduce latency of list operations in snapshot browser #228

Open modem opened 4 months ago

modem commented 4 months ago

I would like to propose a few improvements for the file restore:

  1. Improve Browse and Restore Files in Backup performance When opening this browse and restore area, it can take quite a while for the information to show in the backrest interface. Tried the following:

    • with 1 17GB backup, it takes 4s every time we open any of the folded items.
    • with a 6+TB repo, it takes around 20s every time we open any of the folded items. It does not seem long, but the problem is if we want to restore a file or folder that is deep inside the folder structure, these times will add up. For example, to restore a file/folder that is 5 folders deep, we need to open 6 folded items until we see the file/folder, each taking around 20s (for the later example). The more complicated folder structure the more time we wait for the Backrest interface to show the desired items/folders to restore. It would be nice if we could get a faster structure update, maybe caching the contents, even if we have to wait a bit longer the 1st time we open the backup and restore view to get the full list of items, so every new folder opening would be just loading the already collected information on the interface.
  2. Allow to restore several items at the same time. We currently can only restore 1 file or folder at the time, if we want to restore multiple files/folders we need to wait until the current restore to finish until we can start a new one.

garethgeorge commented 4 months ago

Hey, I'm curious how long restic ls <prefix> takes in your repos as that's the underlying command backrest is using for directory iteration.

The alternative approach that Backrest could possibly take would be a long pause on opening the file browser to index all paths (e.g. in a trie in the database) before permitting browsing but slow operations with more "immediate" startup is the tradeoff I've made for now.

If you need to do a lot of operations, restic supports restic mount (but mount command support isn't something that'll be added to backrest due to the underlying fuse dependency).

Allow to restore several items at the same time. We currently can only restore 1 file or folder at the time, if we want to restore multiple files/folders we need to wait until the current restore to finish until we can start a new one.

Mind opening another bug scoped to this issue? It helps to keep discussion somewhat scoped -- bulleted bugs are difficult to address wholesale.

Is the main issue that the snapshot browser closes itself after you select a file to restore?

modem commented 4 months ago

Hey, I'm curious how long restic ls <prefix> takes in your repos as that's the underlying command backrest is using for directory iteration.

Tried in the biggest repo mentioned above (6+TB), it took 25s to get the list all the files and folders (6735 items). time /bin/restic-0.16.4 ls 759b91f9 -r /repos/BackupMain/

image This repo has several plans, I tested with the last snapshot of the biggest plan. The difference is around +5s from the command line to open just 1 branch on the user interface.

Mind opening another bug scoped to this issue? It helps to keep discussion somewhat scoped -- bulleted bugs are difficult to address wholesale.

Will do.

Is the main issue that the snapshot browser closes itself after you select a file to restore?

No, but I find very difficult to select anything on this popup menu as it disappears very quickly when moving the mouse from the folder to the menu: image

garethgeorge commented 4 months ago

Hmm, 6735 items is actually quite a small file count (I'm guessing a small number of large files). I'll benchmark how long taking a listing of a full repo with a few hundred thousand files takes this weekend to get a sense of how much time indexing the whole repo takes. If it's not too bad, indexing upfront may be a reasonable way to go here.

Just to check -- it looks like you're using a repo on a local HDD? I wouldn't expect this to take 20 seconds, it seems like something strange is going on there. It typically takes ~10 seconds for me using a remote target (backblaze storage) in a 1TB repo.

Reading https://github.com/restic/restic/blob/228b35f074ddf4dec6ce1aea51ccfc2c413d0a01/cmd/restic/cmd_ls.go#L259-L411 I think restic is doing an optimized traversal of the file tree in the data structure to avoid reading subdirectories until they're requested so there's definitely some tradeoff to be had here in terms of how much we prefetch.

garethgeorge commented 4 months ago

Ran some benchmarks on a repo in backblaze (B2) so network fetch time is included in this test

restic ls on all files in a repo of 231819 files took:

~/.local/share/backrest/restic-0.16.4 ls latest  8.01s user 0.91s system 20% cpu 44.355 total

restic ls / (e.g. only the top level folder) took

snapshot d49e70a6 of [/tank_fast] filtered by [/] at 2024-04-18 18:30:05.494168399 -0700 PDT):
/tank_fast
~/.local/share/backrest/restic-0.16.4 ls latest /  5.75s user 0.58s system 32% cpu 19.501 total

following up with restic ls /subdir (maintaining the cache) I found that once the cache for a snapshot is hot indexing is listings were pretty fast:

~/.local/share/backrest/restic-0.16.4 ls latest /subdir/  6.42s user 0.58s system 154% cpu 4.532 total

I reset the cache before the first and second operations by running

export XDG_CACHE_HOME=$(mktemp -d)

tl;dr hard to say what the right tradeoff is here but I'm thinking snapshot browsing is in a pretty acceptable range. There may be something going on on your system that's degrading snapshot listing speed. Are you using a local storage repo / is your disk under heavy load? Any other factors that might contribute to the slowdown?

modem commented 3 months ago

I'm using a an externally HDD connected through USB. This disk is only used for my backups, so most accesses are through backrest. When I tested, backrest was not performing any backup neither restore, so the disk should have been in idle. I see your listing is faster than mine (my backed up files are big files), but I wonder how long it took on your backrest interface.

garethgeorge commented 3 months ago

Hey, it's also taking on the order of 10 seconds with B2 as a remote on my interface or on the order of 2-5 seconds when using a local repo on an SSD.

Is the device you're running Backrest on memory constrained? I wonder if the fork'd restic processes are starting slowly or hitting memory pressure as starting restic for each list operation can be an expensive.

At a high level, I think I'll probably aim to keep list the way it is now as the implementation is very simple and I think works well enough on most devices (<10 seconds is acceptable latency IMO as restores are uncommon), but we can look into debugging why we're seeing such slow listings on your installation.

modem commented 3 months ago

I'm using in docker container in a QNAP NAS. I see in Portainer, it uses a lot of CPU and memory when running the restic ls command (see the spikes in the charts): image

Restarting the container clear the allocated memory, but does not impact the time to run the restic ls command.

Anyway, the time it takes to run ls on the root: time /bin/restic-0.16.4 ls --json e32b25f6c22bfeb0eee8d4a6883eb25820cfcdc8d56e7f785ba9daeb14dbf590 /raid/Popcorn/ -o sftp.args=-oBatchMode=yes image

is more or less the same it takes to read the entire backup contents: time /bin/restic-0.16.4 ls --json e32b25f6c22bfeb0eee8d4a6883eb25820cfcdc8d56e7f785ba9daeb14dbf590 -o sftp.args=-oBatchMode=yes image

So personally, I see a possible improvement when loading the entire backup contents, rather than just folder by folder. But of course there will be impact in the logic afterwards to treat the JSON results... But it could be done only once.

I have 6 plans performing a backup to this repo, not sure it results in any negative impact. The 6735 items mentioned above are related to just 1 plan, not the entire repo.

garethgeorge commented 1 month ago

Sorry for the late reply on this -- interesting analysis and thanks for posting those numbers. It feels like restic is reading way more data than I'd expect when doing the list operation but it's not clear why.

With local storage in repos on my machines this is much faster. It could be an interesting question re: list operation performance, there may be some upstream optimizations to be done here if there's a performance bug.

I'm still thinking that list operations are, ideally, not something backrest tries to index and store up front as with some repos (e.g. VERY many small files) this will be prohibitive.

modem commented 1 month ago

Maybe the performance can impacted by the size of the backup, or by having multiple plans backing up to the same repo. Or a combination of both.

garethgeorge commented 1 month ago

Opened https://github.com/restic/restic/issues/4897 and did some prototyping, I'm hopeful upstreaming some new ls capabilities to restic might be an option forward here.

modem commented 1 month ago

Looks promising.