bugsyb commented 3 years ago

RFE and more of a subject to raise for discussion as the thought process might be wrong.

Could Borg use filesystem journal to discover modified files since last run? Not all filesystems might have all details, but i.e. btrfs and probably ext4 has enough details in journal to identify last moment of borg run and process only new changes instead of scanning again all files?

Motivation behind is that to scan 1.5TB of rather static data it takes long time for Borg. It might be due to "full backup" it does each time, but observing IO on the disk and CPU it is neither using IO nor CPU to the max (1.7MB/s - USB3 disk and ~20% CPU on Odroid N2) hence idea that maybe an improvement from this side could be done?

It can be related to https://github.com/borgbackup/borg/issues/5094 and potentially a dummy RFE - but worth documenting at least for others with similar idea.

RonnyPfannschmidt commented 3 years ago

how many files is that "data"

borg always does "full" backups - so it has no concept of considering diffs as of now

bugsyb commented 3 years ago

Files=>inodes and 1k-blocks details below. Fully understood it is RFE or better request for discussion - cause maybe there's no point to make any changes.

Filesystem             Inodes   IUsed     IFree IUse% Mounted on
/disk 305242112 2381749 302860363    1% /mounted/
Filesystem           1K-blocks       Used Available Use% Mounted on
/disk 4806045612 4296378724 509650504  90% /mounted

And this is what Borg shows:

Duration: 2 hours 39 minutes 1.56 seconds
Number of files: 1405702
Utilization of max. archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.56 TB              1.43 TB              7.97 MB

I am super happy borg user and looking at potential option to improve speed/efficiency based on potential additional knowledge potentially accessible for root level user (Linux/*nix specific), but potentially Windows/NTFS might have something specific. Obviously it can be discarded as it would be not just "simple backup solution", but likewise to get good improvements, sometimes it takes things to get complicated and some dependency.

Thanks again for amazing backup solution!

RonnyPfannschmidt commented 3 years ago

As far as I can tell your use case hits one of the bad cases

Massive amount of small files with no dupes

There may be a something else going

Can you verify whether borg is using the files cache or not?

bugsyb commented 3 years ago

Would you mind to shed some light on how to double check that? the /root/.cache/borg/<SHA?>/ folder gets updated on each run and is of ~2GB size. Hence I'd say yes, it uses it, but I might be wrong.

The command to create archive is: borg create --verbose --filter AME --list --show-rc --exclude-caches --exclude-from backup.lst <REPO> <DISK_MOUNT_PATH> --compression lzma,9 --info --verbose -x --stats

For a long time, I just accepted it taking that much of time, but recently got into thinking - for why it takes that much time... as no clear bottleneck is visible.

The /root sits on SD card, but that gets ~20MB/s of read throughout and not that much write at the time of typical backup. I'm tempted now to move it and symlink to SSD.

ThomasWaldmann commented 3 years ago

@bugsyb borg always backs up ALL files so we need each files metadata and content data. each backup is a full backup.

Reading the content data can be avoided IF we detect the file to be unmodified (by comparing ctime,inode,size from file metadata against the files cache - if there is a match, the file did not change). But to do that, we need to stat() the file. Additionally, we need to read flags, xattrs and acls (because we do not cache them in the files cache).

Thus, knowing the changed-only subset of all files is not useful, because we need to read these metadata from each file anyway.

The only way to solve that would be to make borg work on a full+incremental basis, but that would make it way more complex (for prune, for delete, for restoring files), so that is not something we want to do.

ThomasWaldmann commented 3 years ago

@bugsyb btw:

lzma is rather old and slow. did you try zstd and lz4?
--info and --verbose are equivalent (and you give --verbose twice)

bugsyb commented 3 years ago

@ThomasWaldmann - do I read you correctly that slowness seems to be due to reading file metadata to check if it has been modified? Looking at disk IO it doesn't seem like it would be reading the full file. If that's the case - it seems to be slow too for ext4 filesystem, right?

Not sure about ext4, but for sure, btrfs has transaction log with transaction IDs which could be used to detect any changes at lightning speed. This requires root level privileges but, for whole filesystem it is the case that borg is run from root anyway.

There might be good sources to look at for quick implementation ideas: https://github.com/Zygo/bees

More of sharing and ideas to improve btrfs performance. None of the options (compression, verbosity) should impact that performance, right? Really appreciate comments though - will think about improving these. Any suggestions re parameters to get best compression level? (I know it depends on the content, but for general use case?) In my case it is massive stash of family photos, etc. (no movies, outside of some photo camera videos - small portion of filespace). Probably changing compression won't impact space consumption that much - but always open to improvements.

ThomasWaldmann commented 3 years ago

The files-cache based "is unmodified" check is based on a stat() call on each input file (and files-cache contents). If the check says "unmodified", it will not read content data from the file at all because we know for sure that the file content is unmodified then (borg will just check that we still have all its chunks in the repo, which is a very quick hashtable lookup).

But, in any case (except when giving the --noxattrs --noacls --noflags options), it will read flags, xattrs and acls for each input file.

About transaction log giving information about modified files: we can not get all required information for all files from there, see previous parapgraph. And as we need to get it from somewhere, there are only 2 options: from the fs or from another archive. borg uses the first method to be independent of other archives.

This is a quite fundamental thing, basically making all attempts to optimize by "knowing changed files" futile, no matter whether it is via journal or other mechanisms (inotify and co).

compression for sure impacts performance, especially for high compression levels of lzma, zlib or zstd. more for first backups of course than for subsequent backups with little new data chunks.

verbosity may impact performance sometimes, e.g. if you give --list --verbose and have borg scrolling lots of lines in a big gui window.

lz4 (default) is an extremely quick compression and usually saves more time (by reducing I/O) than it needs (for compressing). of course it compresses less good than zstd, but that's the usual time vs. space thing.

best compression would be some relatively high zstd level. higher level, better compression, but also needs more cpu then, so just try it and choose wisely.

photos (jpeg) usually do not compress because they are already compressed, same for mpeg or other compressed videos.

jdchristensen commented 3 years ago

I don't think this will affect your speed too much, since things are slow for you even when very little has changed, but since you are dealing with incompressible data, I recommend something like --compression auto,lz4, which will avoid compressing chunks that don't appear to be compressible.

To check that the files cache is working, you should check that you have very few files marked with the M (modified) flag in the output. They should instead by marked U (unchanged). With your filter setting, the unchanged files won't be listed, so as long as you are only seeing truly modified files, things should be good.

There does seem to be something fishy going on in your set-up. You have

Duration: 2 hours 39 minutes 1.56 seconds
Number of files: 1405702

On my machine, for a local borg backup with little changed, I get:

Duration: 48.95 seconds
Number of files: 263828

You have 5 times as many files, but it takes 194 times longer. My machine uses 100% of one core during the backup. For a remote backup over a fast network, I get almost exactly the same speed. These are both using ext4 on basic hardware, SSD drive.

Borg always reads the file with the most recent timestamp. If that file is large, it could slow things down, so try doing "touch dummyfile" before starting the backup to rule this out.

PS: While testing for speed, work with a smaller subset of your data, so you can try things more quickly.

bugsyb commented 3 years ago

@ThomasWaldmann , @jdchristensen - thank you for your insights. I am still convinced that the journal might dramatically speed it up - let me explain why - below. Speed wise - belief is that it is due to having HDD and this together with stat() is probably the explanation of what is being observed. More on this below.

journal - why belief is that it's good idea.

Journal, at least on btrfs, has increasing counter & log of transactions (file system modifications). At any time FS is modified, transaction counter increases and journal gets populated. Therefore there's no change without an entry in the journal.

Knowing above following approach might be taken for lightning fast identification of modified FS items: a) check transaction counter / pointer in the journal and record it at the beginning of the process b) first backup - do as normal, Subsequent backups a) check transaction counter, and compare it with previous counter. b) identify FS elements modified between the transactions since last backup and now based on transaction counter c) filter/overlay filters to exclude folders not covered by backup as potentially not a full FS is being backed up. d) proceed with backup of modified files.

This avoids the whole stat() calls for each and every file, as no file could be modified without an entry in transaction log.

What's missing in this logic? I'm not saying that borg has to do it - if I'd know how to put code around it - I would - but don't have enough expertise. It is to share the logic which could be put as RFE for far future. Maybe above logic has a major flaw - I just don't see it but am open for discussion and to learn about it. (no provocation, whatsoever - just open for brainstorming).

speed - backup update (creation, but with very little changes

In regards to speed - belief is that we might have identified culprit for comparison of times. Data is on HDD (spinning - just large storage, where effectively I don't need a speed - it' runs private file sharing for family - nextcloud - so HDD is fast enough for couple of people and other little things stored on it and used daily.

Now... knowing that borg stat() on files to get metadata - it might be just that HDD head has to move a lot, hence the drastic difference between HDD and SDD. CPU in this N2 is weak comparing to any modern laptop (it's SOC) - belief I've mentioned it, but it is underutilized - so that puts all focus on HDD and head moving due to stat().

ThomasWaldmann commented 3 years ago

The major flaw is that we need information about all files. And I think that while every fs update goes through the journal, you can not expect the information to be available in the journal for a long time.

I am not a fs developer, so I might be wrong and it also might depend on the fs - which is another issue: we do not want to have fs-specific code in borg, but rather work in a general way no matter what fs one uses.

HDDs of course are slower than SSDs, esp. if you have a lot of small files. Each random access on HDD is 10ms+. borg optimizes accesses a bit, by doing them in inode order.

jdchristensen commented 3 years ago

@ThomasWaldmann is right that incorporating such logic would be error prone and hard to maintain. In practice, the stats don't take too long, so I still suspect there is something fishy with your setup. I just did a test on a HDD:

Duration: 33.46 seconds
Number of files: 183049

Scaling this up linearly to 1.4M files gives 4 minutes and 16 seconds. So even with that many files, on standard, budget hardware, the stat times aren't slow enough to be a major concern. (The system has enough RAM that a lot of filesystem data is probably cached from yesterday's backup.)

If you run two backups consecutively, can you tell us how long the second one takes? Be sure to "touch" a file, to avoid the issue I mentioned, and to check that borg doesn't think that files were modified in the second run.

RonnyPfannschmidt commented 3 years ago

@ThomasWaldmann could it be a reasonable consideration to be able to stream/consume encoded diffs/deltas into and between borg repos (the idea being that a filesystem diff or a journal detail diff could be used to create a "diff pack" to send to a diff that contans all the items not part of the other backups

then borg would receive the diff and a fs specific tool would nee to create it

this could also be applied to btrfs send/btrfs send with a base

bugsyb commented 3 years ago

@ThomasWaldmann , great idea from @RonnyPfannschmidt. The earlier mentioned bees tool uses journal log to deduplicate files (intrusive into data) so journal has to be consistent. It certainly steps into taking advantages o specific filesystems though speedup which can be achieve is so massive with no negative side on CPU/mem, etc. that it might be worth investigating.

Back to my issue, did run 3 times backup (in between there were small/very small modifications) and consistently it shows above 2hours for just a scan. No high CPU is observed or I/O. I am sort of clueless, outside of the fact it could be a head move time, but should it show up as I/O wait time? With that said, metadata should end up in buffers/cache - no other real activity was happening on this little system (well bind9 and pihole + fail2ban and other small items - but neither CPU nor I/O intensive at all).

------------------------------------------------------------------------------
Archive name: n2-storage-hdd-20210702_103757
Time (start): Fri, 2021-07-02 10:38:08
Time (end):   Fri, 2021-07-02 12:44:17
Duration: 2 hours 6 minutes 9.27 seconds
Number of files: 1432640
Utilization of max. archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.56 TB              1.44 TB             27.48 MB
All archives:               56.59 TB             51.63 TB              1.53 TB

                       Unique chunks         Total chunks
Chunk index:                 2209996             74553151
------------------------------------------------------------------------------
terminating with success status, rc 0

------------------------------------------------------------------------------
Archive name: n2-storage-hdd-20210702_125328
Time (start): Fri, 2021-07-02 12:53:39
Time (end):   Fri, 2021-07-02 15:01:32
Duration: 2 hours 7 minutes 52.88 seconds
Number of files: 1432640
Utilization of max. archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.56 TB              1.44 TB            506.52 kB
All archives:               58.17 TB             53.07 TB              1.53 TB

                       Unique chunks         Total chunks
Chunk index:                 2210102             76637960

------------------------------------------------------------------------------
Archive name: n2-storage-hdd-20210702_151441
Time (start): Fri, 2021-07-02 15:14:55
Time (end):   Fri, 2021-07-02 17:28:35
Duration: 2 hours 13 minutes 40.50 seconds
Number of files: 1432711
Utilization of max. archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.56 TB              1.44 TB            112.92 MB
All archives:               59.74 TB             54.52 TB              1.53 TB

                       Unique chunks         Total chunks
Chunk index:                 2210478             78722887
------------------------------------------------------------------------------
terminating with success status, rc 0

Memory

MemTotal:        3795968 kB
MemFree:          104304 kB
MemAvailable:    2046348 kB
Buffers:          372932 kB
Cached:          1594488 kB
SwapCached:        54760 kB
Active:          2084980 kB
Inactive:         700708 kB
Active(anon):     644036 kB
Inactive(anon):   362108 kB
Active(file):    1440944 kB
Inactive(file):   338600 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       3145724 kB
SwapFree:        2032340 kB
Dirty:               176 kB
Writeback:             0 kB
AnonPages:        806988 kB
Mapped:           352724 kB
Shmem:            187876 kB
Slab:             422424 kB
SReclaimable:     228036 kB
SUnreclaim:       194388 kB
KernelStack:       19152 kB
PageTables:        22020 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5043708 kB
Committed_AS:    7452120 kB
VmallocTotal:   263061440 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:         524288 kB
CmaFree:               0 kB

bugsyb commented 3 years ago

For comparison same system, SSD disk (much smaller backup, though still) - surprisingly similar level of ms per file ~5.5981 vs 5.2856. (HDD vs SSD).

Though in this case, middle run took a bit more than 1/3rd of the time of first and last.

Both disks end up in same repository (dedup purposes):

------------------------------------------------------------------------------
Archive name: n2-storage-ssd-20210702_102548
Time (start): Fri, 2021-07-02 10:26:09
Time (end):   Fri, 2021-07-02 10:35:13
Duration: 9 minutes 4.30 seconds
Number of files: 98740
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:               10.41 GB              3.14 GB             79.01 MB
All archives:               55.03 TB             50.20 TB              1.53 TB

                       Unique chunks         Total chunks
Chunk index:                 2209907             72568322
------------------------------------------------------------------------------
terminating with success status, rc 0

------------------------------------------------------------------------------
Archive name: n2-storage-ssd-20210702_124613
Time (start): Fri, 2021-07-02 12:46:29
Time (end):   Fri, 2021-07-02 12:50:15
Duration: 3 minutes 46.07 seconds
Number of files: 98741
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:               10.42 GB              3.15 GB             12.99 MB
All archives:               56.60 TB             51.64 TB              1.53 TB

                       Unique chunks         Total chunks
Chunk index:                 2210094             74653130
------------------------------------------------------------------------------
terminating with success status, rc 0

------------------------------------------------------------------------------
Archive name: n2-storage-ssd-20210702_150229
Time (start): Fri, 2021-07-02 15:02:43
Time (end):   Fri, 2021-07-02 15:11:25
Duration: 8 minutes 41.91 seconds
Number of files: 98742
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:               10.43 GB              3.15 GB             56.56 MB
All archives:               58.18 TB             53.08 TB              1.53 TB

                       Unique chunks         Total chunks
Chunk index:                 2210304             76737943
------------------------------------------------------------------------------
terminating with success status, rc 0

adept commented 3 years ago

Your earlier comment mentioned "path were the filesystem is mounted". Is this storage local? What kind of filesystem is it on? Is the name of mount point always the same?

I can only echo what others are saying: there is something fishy in your setup. I have 1.5Tb of photos and videos (about 150K files) on local btrfs fs (and some time ago it was on local ext4 fs), and "borg create" consistently takes between 2 and 3 minutes. (both now on btrfs and in the past on ext4)

bugsyb commented 3 years ago

No questions that there's something fishy - in fact I've moved it for tests to ext4 - same results. Given it's small CPU (Odroid N2) with 6 cores (4 fast, 2 slow) (details below), way of thinking is to exclude that it's CPU related by setting sim affinity for the process. This should make it easier for tracking down where the bottleneck is.

It's local to the CPU filesystem - disks connected over USB, fs is in Luks container - but again there's no CPU issue visible, no threads related to neither kernel (USB or Luks) which usually pop pup to the front when there's CPU intensive use of these.

model name  : Amlogic S922X rev a
Hardware    : Hardkernel ODROID-N2

Level of I/O at filesystem block level (unencrypted container):

dd if=/dev/mapper/storage of=/dev/zero bs=1M count=1000 skip=6000
1000+0 records in
1000+0 records out
1048576000 bytes (1000.0MB) copied, 13.138527 seconds, 76.1MB/s

Knowing that SSD behaves similarly - head move time is being excluded.

It is CoreElec (Kodi) orientated small distro, with kernel:

Linux n2 4.9.113 #1 SMP PREEMPT Sun Jun 6 11:49:21 CEST 2021 aarch64 GNU/Linux

I'm all ears towards how to narrow down the culprit.

adept commented 3 years ago

So files that you back up have stable full paths (as your mount point for this filesystem is always the same) and stable inode numbers and ctimes/mtimes, and no fuse is involved. Is this all correct?

bugsyb commented 3 years ago

Yes, it's same mount point, mounted shortly after boot and system is reboot rather rarely, most of the time once every couple of months (kernel related updates).

adept commented 3 years ago

You are running with --filter AME. Does it list all the files as A or M while it runs?

If you backup just a small folder from that filesystem into newly-created repository with --filter AME and the rest of your usual flags, does it exhibit the same behavior (files reported as added / modified)?

bugsyb commented 3 years ago

It only lists modified/added files (not all).

Created new repo for tests - probably too small test bucket so it went quickly:

First backup

Time (start): Sat, 2021-07-03 00:42:53
Time (end):   Sat, 2021-07-03 00:49:34
Duration: 6 minutes 40.90 seconds
Number of files: 3386
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                   Original size      Compressed size    Deduplicated size
This archive:              339.89 MB            310.76 MB            308.17 MB
All archives:              339.89 MB            310.76 MB            308.17 MB

                   Unique chunks         Total chunks
Chunk index:                    3516                 3541

2nd run

Time (start): Sat, 2021-07-03 00:52:08
Time (end):   Sat, 2021-07-03 00:52:11
Duration: 2.47 seconds
Number of files: 3387
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              339.90 MB            310.76 MB             23.62 kB
All archives:              679.79 MB            621.52 MB            308.19 MB

                       Unique chunks         Total chunks
Chunk index:                    3520                 7083

Not sure if should create 100000 files (would need to write a small script to create it) or run other backup to get any buffers cleaned and then re-run test on that small set?

adept commented 3 years ago

So this test seems to behave as expected (second run is fast, no files listed as erroneously modified, etc) How long does this take to run on that 1.5 mln files of yours?

time find . -type f -print0 | xargs -0 stat --format '%Y' > /dev/null

bugsyb commented 3 years ago

Did run some additional test.

Generated files that way (small size):

#!/opt/bin/bash
OD="$PWD"
for i in {1..10}; do
  mkdir $i; cd $i
  for j in {1..10}; do 
    mkdir $j; cd $j
    for k in {1..1000}; do
      echo $i_$j_$k >> $k
    done
    cd ..
  done
  cd ..
done

Then did run backup (same new repo as in step above):

Time (start): Sat, 2021-07-03 01:06:03
Time (end):   Sat, 2021-07-03 01:10:33
Duration: 4 minutes 29.85 seconds
Number of files: 103388
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              362.02 MB            319.32 MB              3.12 MB
All archives:                1.04 GB            940.83 MB            311.31 MB

                       Unique chunks         Total chunks
Chunk index:                    4664               110768
------------------------------------------------------------------------------

Second and next runs look that way:

A /var/media/storage-hdd/tmp/modified/dupa/10/10/999
A /var/media/storage-hdd/tmp/modified/dupa/10/10/1000
------------------------------------------------------------------------------
Archive name: n2-storage-hdd-tests-tmp-20210703_013638
Archive fingerprint: 4e6a7d0cf4e81e88dbfd9618ea722578eaa37977e5271a403a0584ccd22c8f0f
Time (start): Sat, 2021-07-03 01:36:48
Time (end):   Sat, 2021-07-03 01:37:44
Duration: 56.75 seconds
Number of files: 103388
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              362.02 MB            319.32 MB              5.73 kB
All archives:                2.13 GB              1.90 GB            311.33 MB

                       Unique chunks         Total chunks
Chunk index:                    4667               421823
------------------------------------------------------------------------------

What is weird is that it shows again and again same set of files (not all, just form /10/10/ folder each time the last 30 files.

At the same time:

time find . -type f -print0 | xargs -0 stat  > /dev/null;time find . -type f -print0 | xargs -0  cat > /dev/null

real    0m0.750s
user    0m0.364s
sys 0m0.380s

### real read of these files
real    0m6.991s
user    0m0.016s
sys 0m0.780s

It obviously takes much more time for borg to run over these files - not sure why.

adept commented 3 years ago

Well, borg does more work than just reading the files (chunking, deduplication), so it will always be slower than just reading the files. So in you 100K files test some files are consistently being reported as "A" on every run?

jdchristensen commented 3 years ago

Remember that all files whose timestamp is the same as the most recent file are rechunked every time. As I mentioned twice above, be sure to touch one small file in the tree before running borg, so only that small file is rechunked. You should do this for the original backup set as well, if you haven't already.

One other thing to check is whether borg is swapping.

ThomasWaldmann commented 3 years ago

SwapTotal:       3145724 kB
SwapFree:        2032340 kB

Your system is "using" 1GB of swap already. Depending on how active (page in / page out) that usage is, this might be an issue or not.

So maybe monitor page activity while borg is running. If it is paging in/out all the time, you're using more memory than you have RAM and that might explain the slowness. See the borg docs/faq about some hints how to optimize memory needs.

bugsyb commented 3 years ago

Highly appreciate your help and feedback.

@ThomasWaldmann , small explanation on some details. The "swap" is in fact zram device and effectively sitting in ram - there's no swap in/out to any of drives. There's no visibility in kernel swap process during borg backup.

For benefit of tests, swap has been turned off, completely and usual backup run, it was again dancing around 2h40 mins, with best time 2h20 on first run.

@adept , fully understood that Borg does "a little bit more" than just files read/stats - it was just to demonstrate that I/O or metadata read doesn't seem to be a bottleneck.

@jdchristensen - on previous runs - I did for sure on the last run did touch at least one file (test runs on that artificial list of files - yet it did show these listed files as were backed up earlier). It is not a concern for me, as focus is more on time of the process for the real backup as you guys pointed it is unusually large. With swap off, there's no question of Borg swapping.

In all honesty, if you guys wouldn't point that it is unusually slow, I would not think that could or should expect form Borg more, but at the same time I do admit it is surprisingly slow, given no visible bottleneck.

Happy to run additional tests - would just need to be a bit guided on what/how to capture for investigation as am getting to the corner as only idea would be to dismantle setup in terms - exclude encryption (Luks), stop any other processes on the system, etc. But... this is not viable for me as don't have a spare box to deliver what I'm using this box for. At the same time there's no logical explanation on what could be a bottleneck.

I can ran strace - but not sure if this would help at all, especially as around perf/timing strace will have its own hit on perf.

cat /proc/meminfo  
MemTotal:        3795968 kB
MemFree:           56816 kB
MemAvailable:    1405804 kB
Buffers:           68548 kB
Cached:          1241788 kB
SwapCached:            0 kB
Active:          1476280 kB
Inactive:        1539092 kB
Active(anon):     818096 kB
Inactive(anon):  1131800 kB
Active(file):     658184 kB
Inactive(file):   407292 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               580 kB
Writeback:             0 kB
AnonPages:       1700028 kB
Mapped:           317568 kB
Shmem:            244860 kB
Slab:             535780 kB
SReclaimable:     349048 kB
SUnreclaim:       186732 kB
KernelStack:       18864 kB
PageTables:        21056 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1897984 kB
Committed_AS:    7395388 kB
VmallocTotal:   263061440 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:         524288 kB
CmaFree:           26416 kB

Updated: 2:20 mins => 2h20 mins, 2:40 mins => 2h40mins

ThomasWaldmann commented 1 year ago

closing this, borg needs metadata of all files, so knowing the modified files has no advantage.

bugsyb commented 1 year ago

closing this, borg needs metadata of all files, so knowing the modified files has no advantage.

Knowing which files were modified allows to pull all other necessary details about the files without scanning all which is have on the system.

ThomasWaldmann commented 1 year ago

Please read the comments above. We need metadata of ALL files, not just of the modified ones.

borgbackup / borg

quick discovery of modified files on journaled filesystem #5865

journal - why belief is that it's good idea.

speed - backup update (creation, but with very little changes