Restoring without local database takes ages

jarmo commented 9 years ago

I tried to restore a single ~500kB file from a ~500GB backup without a local database. It takes hours and hours and i can see a lot of Downloading file (27,22 KB) ... messages. As soon as i specify local database with --dbpath then the restore is done in minutes.

It does take hours even if the backup destination is local disk.

There seems to be something fundamentally wrong when restoring a single file with exact path from local destination takes more than few minutes.

Why does it take that long?

Using latest duplicati 2.x version.

jarmo commented 9 years ago

It finally finished. Pay attention to the duration.

Checking remote backup ...
  Listing remote folder ...
Searching backup 0 (25.05.2015 1:44:00) ...
Checking existing target files ...
  1 files need to be restored (563,18 KB)
Scanning local files for needed data ...
1 remote files are required to restore
  Downloading file (49,95 MB) ...
  0 files need to be restored (0 bytes)
Verifying restored files ...
Restored 1 (563,18 KB) files to c:\Users\Jarmo\Desktop\restore
Duration of restore: 15:24:03

jarmo commented 9 years ago

Restoring now all files from a 70 and 700 GB backup without local database files. Wish me luck, because this is a real situation this time and not just a test. Currently seeing "Downloading file (47.78 MB) ..." appearing REALLY slowly although the backup location is a local disk.

kenkendk commented 9 years ago

If you do not have a local database, Duplicati will build one before continuing. There is some logic in the latest (2.0.0.90) version that checks if it can avoid to restore the entire database, which should speed things up. If something is not matching correctly, Duplicati will start looking for the needed information in potentially all files.

If you need to perform the restore, I recommend that you run the "repair" command first to generate the local database. This will make it possible to stop and resume the restore process without needing to generate the database again.

The 2.0.0.90 version also features a recovery tool, which works with a simple text file instead of a database.

jarmo commented 9 years ago

@kenkendk thank You for your reply.

I can now tell more about the whole situation, described in the order of steps done:

1) HDD failed on 15th Oct; 2) Made clean install of Windows 10 at Oct 16th; 3) Started two simultaneous restore processes at Oct 16th of ~70GB and ~700GB backups - both without having a local database present; 4) At the evening of Oct 18th I saw that restore process for ~70GB was successfully completed, but there were no files restored from my ~700GB backup due to error similar to #1143 - not a single file - same problem can be reproduced on a Windows and Linux environment; 5) Fortunately I had a local databases backupped in that ~70GB backup for both of the backups so I started restore process again of ~700GB backup, this time with the option --dbpath option pointed to the recently restored sqlite database - this happened on the same evening of Oct 18th; 6) Today is Oct 28th and restore process has still not (!!!) finished - there's still about ~140GB to restore according to duplicati;

Good thing is that I've managed to get everything back from the first backup and it seems that restore process for the larger backup has been slowly progressing. Can't still say for sure if I've managed to get everything back yet or not since I'm not touching that PC just in case.

There's some things I've noticed:

--no-local-db=true does not seem to make any difference when trying to restore without a database, but reading from the advanced help section it does seem that rebuilding of the local database is skipped;
I have quite a good performance desktop PC with 8 CPU cores, but duplicati restore process only utilizes about ~12% of that CPU constantly;
There's almost no IO according to Performance Monitor or even HDD led's - yes there are occasional spikes, but most of the time nothing happens;
Restoring from the GUI (e.g. web-page) does not work (at least, didn't restore anything within a 48h from that ~70GB backup) - it seems that backups as big are just not suitable for that GUI solution;

In conclusion:

In some cases, Duplicati won't even restore a single file due to checksum mismatch, when trying to restore without a database - however, when restoring with a database, checksum does match - weird;
Duplicati does not use CPU nor IO power fully to perform restore operation, thus making it really painfully slow;

Some background about the contents of my backups as well:

~70GB backup is from a system disk containing mostly applications configuration, documents, etc - e.g. many small files;
~700GB backup consists mostly of RAW images, psd and jpeg images - more bigger files;

In short - restore process is still in progress (after 10+ days!), but at least there's hope for getting stuff back. However, I have to admit that I'm quite disappointed already in Duplicati and looking for (even paid) alternatives.

Hopefully, these issues will be resolved in a future version of duplicati and no-one else have to go through similar suffering.

jarmo commented 9 years ago

Update: restore process finally finished.

Here's what it outputted:

Restored 50641 (766,85 GB) files to h:/restore
Duration of restore: 04:39:52

I'm not sure what it means by 4 hours, because it actually happened from 18th Oct to 1st Nov.

kenkendk commented 9 years ago

The duration text displays hh:mm:ss, so the additional days are not shown.

Happy to hear that it worked after all that time.

I am aware of performance issues, but if you see no disk IO, I would guess that the majority of the time is spent on network traffic (either to a local share, or to the remote end).

jarmo commented 9 years ago

@kenkendk, that isn't the case - there was no network IO present as well and there would not have been any need (unless everything is sent to NSA :P) because backup was on a different (from restore disk) local physical disk. In other words, there was no need to retrieve a single byte over a network.

kenkendk commented 9 years ago

I assume H:\ was a network mount, but if it is a physically attached disk then there is no reason there should be network IO.

I would guess that time is spent creating transaction snapshots for the SQLite database, but that should show up as either CPU usage or disk usage.

Do you have any guess as to where the time goes?

jarmo commented 9 years ago

Yeah, it is a physical disk, not a network mount. I don't have any guesses, it seemed as if duplicati didn't know that I have more than one core on my CPU or the --thread-priority is always very low.

I myself used --thread-priority=high as a setting, but it didn't seem to make any difference without specifying that option at all.

I also used --debug-output=true --log-file="h:/duplicati-restore.log" --log-level="Information" as options, but I guess Information level is too high. I could try to restore again with lower log level.

The version I'm used for restoring was 2.0.0.87_preview_2015-07-03, maybe newer version is better?

Can you give me any command line options which might help you more?

jarmo commented 9 years ago

I've cloned this repository and ran Duplicati.Commandline with debugger using VS Express.

Of course I didn't wait until restore process finished (it would have been taken more than 14 days with debugger, I suppose :P), but here's some findings.

It seems that most of the time is spent in SQlite databases;
Initial loading process before actual restoring takes place, takes quite much time. Here's few examples:

- GetFilesAndSourceBlocksFast ~ 275s
- // Fill BLOCKS with remote sources
  var volumes = database.GetMissingVolumes().ToList();
  if (volumes.Count > 0) ~469s, volumes.Count=15 000

But since this happens only (at least that's how it seemed to me), maybe this is not that a big of a problem.

It seems that a lot of time is spent within restore loop:

- foreach(var restorelist in volumekeeper.FilesWithMissingBlocks) ~ 35s
- foreach(var restoremetadata in volumekeeper.MetadataWithMissingBlocks) ~ 26s
- blockmarker.UpdateProcessed(result.OperationProgressUpdater) ~10s
- blockmarker.Commit(result) ~7s

In other words, spending time in SQlite for every block (50MB?) takes at least an extra minute and since I have 15 000 files (volumes.Count) to process then it takes roughly ~10 days.

Looking at the code, it seems that many complex select statements are performed with multiple sub-queries etc. and none of the (at least) temporary tables have any indexes.

Maybe the problem isn't just the lack of indexes, but there's also some SQlite locks involved?

Anyway, I bet that all this problem exists only for large backups and doesn't happen in usual development environment.

I can't investigate this problem any further right now and I don't feel myself quite home with VS, but I hope that some of this information helps.

jarmo commented 9 years ago

Maybe here's something useful for optimizing queries in SQlite https://www.sqlite.org/optoverview.html

JF28 commented 9 years ago

Hi,

I encounter exactly the same problem : restoration takes ages without the database.

I configured a backup with Duplicati (v2.0.0.87) between two servers. Duplicati sends its zipped files from the server A to server B via SFTP. No problem for the backup itself, it works smoothly daily (size of data to back up = 160GB divided in 30000 files).

Problem arises when I want to restore files on server B (therefore without the database, which is located on server A). It seems that Duplicati literally wastes lots of time just to rebuild the database. In my last attempt, I left Duplicati run for two days on server B. After two days, the database was still not completely rebuilt, and therefore absolutely no file could be restored in 48 hours. Crazy !

Now, if I manually sends the database from server A to server B via SFTP, and rerun Duplicati on server B with the proper argument --dbpath, files restoration immediately starts at an acceptable speed.

So, I have two questions for Kenneth.

1) Why does it take so long to rebuild the database on server B ? dblocks files are available on local high-end hard disks (6Gb/s SAS), so there is nothing to download on network. CPU (Xeon E5-2440v2, 8 cores) is not a concern either, nor RAM (32GB). Duplicati has plenty of resources available to rebuild the database, but uses very little of it, and it seems to get bogged down in endless SQL queries (profiling log just shows countless SELECT queries, with few INSERT statements).

2) As a workaround, is it not conceivable that Duplicati saves his SQLite database on the backend at the end of each backup, so that the database is immediately available if a restoration must be performed on a computer other than the one that made the backup ?

Thank you !

KiLLeRRaT commented 9 years ago

The way I'm doing my backup is to create my entire backup on the server being backed up, including the local DB in the same folder.

Once the backup is completed, I have a robocopy script run which takes all that and copies it to my backup location (including the local DB).

When I restore, I specify the local DB which resides on the backup location and everything works reasonably quickly....

Perhaps something to look at doing, always copy across the local DB to the backup location and then when restoring:

Check local location for Local DB
Check backup location for Local DB
Rebuild Local DB

On Mon, Nov 9, 2015 at 9:07 AM JF28 notifications@github.com wrote:

Hi,

I encounter exactly the same problem : restoration takes ages without the database.

I configured a backup with Duplicati (v2.0.0.87) between two servers. Duplicati sends its zipped files from the server A to server B via SFTP. No problem for the backup itself, it works smoothly daily (size of data to back up = 160GB divided in 30000 files).

Problem arises when I want to restore files on server B (therefore without the database, which is located on server A). It seems that Duplicati literally wastes lots of time just to rebuild the database. In my last attempt, I left Duplicati run for two days on server B. After two days, the database was still not completely rebuilt, and therefore absolutely no file could be restored in 48 hours. Crazy !

Now, if I manually sends the database from server A to server B via SFTP, and rerun Duplicati on server B with the proper argument --dbpath, files restoration immediately starts at an acceptable speed.

So, I have two questions for Kenneth.

1) Why does it take so long to rebuild the database on server B ? dblocks files are available on local high-end hard disks (6Gb/s SAS), so there is nothing to download on network. CPU (Xeon E5-2440v2, 8 cores) is not a concern either, nor RAM (32GB). Duplicati has plenty of resources available to rebuild the database, but uses very little of it, and it seems to get bogged down in endless SQL queries (profiling log just shows countless SELECT queries, with few INSERT statements).

2) As a workaround, is it not conceivable that Duplicati saves his SQLite database on the backend at the end of each backup, so that the database is immediately available if a restoration must be performed on a computer other than the one that made the backup ?

Thank you !

— Reply to this email directly or view it on GitHub https://github.com/duplicati/duplicati/issues/1391#issuecomment-154861948 .

Regards, Albert

www.gouws.org | blog.gouws.org | www.flickr.com/albert_gouws

jarmo commented 9 years ago

This is not exactly the same - mine took ages even with the database. As I already wrote, there seem to be problems with (lack of) indexes in that database.

Also, exposing database to the 3rd location includes slight privacy risk - all paths with filenames are in the database. It's not a problem when not using encryption and privacy does not matter that much.

I have plans to look more into that problem, but don't know when I get time to do that yet.

kenkendk commented 8 years ago

@jarmo Thanks for the measurements. I have not optimized the queries for the restore process, but I will take a look at the problematic ones that you discovered.

The FilesWithMissingBlocks and MetadataWithMissingBlocks queries are almost identical, so they can be optimized in one go. The UpdateProcessed query is a count, which I think thank be extracted from the actual update queries so that time goes away completely.

The Commit call is likely not possible to address, as that is what writes the database to disk.

kenkendk commented 8 years ago

I made a new build with some added indexes: http://updates.duplicati.com/preview/duplicati-2.0.0.91_preview_2015-11-18.zip

Could you try it out, preferably with the debugger attached so we can see if the indexes helped?

verybadsoldier commented 8 years ago

I am also experiencing very very slow restore performance here. I am using Windows 8.1 x64 machine to restore my 2 GB backup from Google Drive. After 1 hour it had restored about 84 MB only. So at that pace it would take about 24 hours to restore the whole thing. Also repairing the database did not make a difference in performance. Any help for me?

It says "You are currently running Duplicati 2.0.0.91- 2.0.0.91_preview_2015-11-18

EDIT: I can see it maxing out one CPU core though.

verybadsoldier commented 8 years ago

It is also slow when restoring from a local storage backup instead of Google Drive. So it really seems to have nothing to do with the internet connection but solely with CPU usage for some reason.

mach-o commented 8 years ago

Same issue here.

I backed up 58GB from a Ubuntu laptop to a remote Raspberry Pi device, which took 17 hours, and then tried to locally restore the backup, as a trial run for the scenario where my laptop suffers some unfortunate mishap.

The "No local database, building a temporary database" phase took around 22 hours, then "Searching backup 0" took around 5 hours and (rightly) reported about ½ million files to restore. The restore has now been stuck on "Scanning local files for needed data..." for about two days. Although the directory structure has been restored, no files have yet been.

This is using the git master version from about five days ago, which reports itself as "Version: 2.0.0.7 - DEBUG". I admit to being largely ignorant of Duplicati's internal workings, and the Pi is certainly not a processing powerhouse, but it's not clear to me why unzipping a copy of the backup would need to take this long. I backed up with --no-encryption, which I would assume should also speed things up.

JF28 commented 8 years ago

Just for information, my own feedback relative to my own situation (see my previous message). Even with the latest version of Duplicati (git-master), restoration from a purely local backend (again, see my previous message) takes a huge time. Even worse : I said previously that avoiding the complex rebuild process of the database by backing up the sqlite file on the backend was a good workaround to speed the restoration up tremendously. In fact, I was compeltely wrong. It worked just once. All restore tests I did subsequently have been failures, with or without database rebuilding : each restoration test seems just to sink in the mud (I had to give up systematically after several days of waiting, with virtually no file restored). I'm in the same situation as jarno : restoration takes ages, even with the availability of the database locally.

Sadly, it seems that Duplicati can not restore "large" backup (consisting of tens of gigabytes of data).

mach-o commented 8 years ago

Thanks for that information, @JF28. I have the luxury of being able to let my restore attempt run indefinitely, so I will try to write back here if it ever finishes (or progresses).

verybadsoldier commented 8 years ago

@JF28 For me it does not even seem to be related to huge backups: As said above, restoring just 2GB of data takes about 24 hours for me (local db). Does anyone know maybe an alternative to Duplicati? This restore problem is quite a disadvantage for me.

kenkendk commented 8 years ago

The "Scanning local files ..." is checking to see if any of the files on your disk has the blocks needed, so it can avoid downloading remote data. You can disable this with --no-local-blocks.

The restore process can be optimized in many places, but I have not gotten around to looking at it just yet.

As for alternatives, I think this is as close to a definitive list as possible (although the Duplicati entry is not updated): https://en.wikipedia.org/wiki/List_of_backup_software

mach-o commented 8 years ago

@verybadsoldier : The best alternative will depend on your particular needs and platform(s). I personally want free software that's straightforward to setup and that makes encrypted, scheduled, differential backups on Linux, so helping to improve Duplicati to get it there seems my best option. I've disqualified all the other tools that I've seriously considered, namely because:

rsync, though otherwise attractive and elegant, isn't designed for differential backups;
duplicity was crashy even at its initial steps when I tried to use it;
backuppc isn't designed for encrypted backups, and seems aimed more at larger network administration than personal use; and
bacula has a monumental learning curve that I am very happy not to have to subject myself to.

mach-o commented 8 years ago

After running the restore I mentioned above for about four days, extrapolating the full restore time based on the progress indicator suggests that it will take about four years to complete. Oof! Looks like it's back to rsync for me for the moment.

kenkendk commented 8 years ago

Ouch!

It will be a while before I can look at the restore speed.

Do you have any idea where the time is used? How many files? How big is the backup? What kind of disk and CPU are you using?

mach-o commented 8 years ago

58GB in ½ a million files, using an ARM11 processor (Raspberry Pi) on an external USB drive (Western Digital). Not sure where the time is used.

Incidentally, I should correct my backup software comparison above, by noting that rsync is, in fact, designed to do differential backups, using its --link-dest option.

FootStark commented 8 years ago

Hi @kenkendk ,

restore process is still awfully slow. I found out that the update on progress takes several seconds.

When using an index with (FileId, Index) on the update table, it works fine. It sped up my restore from 2 hours to 3 minutes (250 MB). EDIT: Checked with release build against a real backup: 1GB, 14.000 files. from 15hours down to 2 minutes.

I'm new to GitHub, so I post the change here. Hope you can incorperate it soon.

LocalRestoreDatabase.cs:Line_769 m_updateTable = "UpdatedBlocks-" + Library.Utility.Utility.ByteArrayAsHexString(Guid.NewGuid().ToByteArray()); m_insertblockCommand.ExecuteNonQuery(string.Format(@"CREATE TEMPORARY TABLE ""{0}"" (""FileID"" INTEGER NOT NULL, ""Index"" INTEGER NOT NULL, ""Hash"" TEXT NOT NULL, ""Size"" INTEGER NOT NULL, ""Metadata"" BOOLEAN NOT NULL)", m_updateTable)); m_insertblockCommand.ExecuteNonQuery(string.Format(@"CREATE INDEX ""{0}_FileIdIndexIndex"" ON ""{0}"" (""FileId"", ""Index"")", m_updateTable)); // not necessary: m_insertblockCommand.ExecuteNonQuery(string.Format(@"CREATE INDEX ""{0}_HashSizeIndex"" ON ""{0}"" (""Hash"", ""Size"")", m_updateTable)); // not necessary: m_insertblockCommand.ExecuteNonQuery(string.Format(@"CREATE INDEX ""{0}_IndexIndex"" ON ""{0}"" (""Index"")", m_updateTable));

kenkendk commented 8 years ago

Great find!!

I will make sure to merge it in. Is the Hash-Size index not being used?

kenkendk commented 8 years ago

It looks like it is used in line 842, but perhaps it has no real impact on the overall time.

kenkendk commented 8 years ago

Also, there is another performance fix in issue #1569 which reduces the time to rebuild the local database.

JF28 commented 8 years ago

Can't wait the next preview release ^____^

FootStark commented 8 years ago

Index Hash-Size might or might not be used, but i guess it is not. The Index on Index is toxic, because it is most likely no index at all. If I got it right, Index denominates the number of a block in a file. If that is true, it is almost always single digit (FileSizes < 1M). Thus, there is no distinctiveness and any optimization attempt on this results in a forced nested loop. Depending on how sophisticated the stats and execution plan optimization of SQLite is (awareness of index discintiveness?), I guess it tries to optimize Joins using Index-Index, because it is smaller (single field, Int) and thus appears better suited than Hash-Size. I can't prove that, as I don't know how to retrieve actual execution plans from SQLite (I work with MsSql most of my time). Anyway, Index on FileId and Index is much better suited, as it is small (2 Ints) and thus fast, very disctincitve (unique?), represents the order the rows are added (table is actually clustered), and logically represents the primary join condition to the blocks table (as used in UpdateProcessed and the Update on Commit). This is why this index is sufficient and I removed the other 2.

kenkendk commented 8 years ago

Great, thanks for clarifying! I just wanted to make sure there was a reason, and what you say makes perfect sense.

In case you are interested in the SQLite execution plan, you can get it with EXPLAIN: https://www.sqlite.org/lang_explain.html

Now that you mention MsSQL, do you see a potential benefit from being able to connect to MsSQL? (Not suggesting removing SQLite, but rather to allow alternate RDMS systems).

kenkendk commented 8 years ago

@JF28: Here you go: http://updates.duplicati.com/preview/duplicati-2.0.0.99_preview_2016-02-15.zip

kenkendk commented 8 years ago

@jarmo, @verybadsoldier, @mach-o: can you retry your setups, and see if we are within a tolerable timeframes now?

FootStark commented 8 years ago

Hmm, from the top of my head, I think SQLite is a solid choice for this application. There are not many benefts to leverage from other DBMS. Some pros and cons:

Pro others:

Better multithreading Support
Better concurrency engines (Client-Server)
More sophisticated optimizations
Better development environments, Easy (external) Performance monitoring

Con others:

Additional installations necessary
More Overhead and complexity in Code
Additional sources for errors

I do not see, where the Pros do actually apply here (e.g. concurrent db Access). From my perspective, because of the rather small datasets of Duplicati (in DBMS terms), the cons outweigh the Advantages by far. It would be better to fully leverage the possibilities with SQLite (e.g. allow to backup database file after main backup is done, solves any rebuild issues). Small Performance issues like above can be tracked and removed as necessary.

tl:dr; Stick with SQLite solely.

kenkendk commented 8 years ago

@FootStark: Thanks!

FootStark commented 8 years ago

@kenkendk Thank you for pointing me to EXPLAIN, it's very useful! Out of curiousity i put it to a test on the problem at hand, and it actually proves that IndexIndex was used prior to the fix (line 4). That means future results should be consistently faster.

0   0   0   SEARCH TABLE Blocks-04DFA3B70EAB5247B6FC3946AE24CF59 USING INDEX Blocks-04DFA3B70EAB5247B6FC3946AE24CF59_RestoredMetadataIndex (Restored=?)
0   0   0   EXECUTE LIST SUBQUERY 1
1   0   0   SCAN TABLE Blocks-04DFA3B70EAB5247B6FC3946AE24CF59
1   1   1   SEARCH TABLE UpdatedBlocks-5F63D042E260E249B6AF4221AA192477 USING INDEX UpdatedBlocks-5F63D042E260E249B6AF4221AA192477_IndexIndex (Index=?)
0   0   0   SEARCH TABLE Blocks-04DFA3B70EAB5247B6FC3946AE24CF59 USING INTEGER PRIMARY KEY (rowid=?)

verybadsoldier commented 8 years ago

Great stuff guys! Just restored 3.4 GB from Google Drive within ~4 minutes! Big thanks!

mach-o commented 8 years ago

Great news, I'll try it when I have a minute!

jarmo commented 8 years ago

I've been trying to restore with existing local database using version from master and it has restored ~5.5GB out of 760GB in a hour. Calculating roughly it will take about 10 days to restore it all. In other words, it seems to be slightly faster (I'm also running it in Visual Studio Express using debug mode with all breakpoints disabled, but it might still affect performance).

I just got an idea, which might (or not) improve performance speed even more - what if we delete the "used" rows from temporary tables so that the tables will get smaller after each file restored? Would that idea even work in general?

Although, I'm not too familiar with inner workings of Duplicati, and I have only around 50000 files to be restored (which is not that big of an amount really) so maybe there are still better places to optimize.

When I have more time, I might look into it again in more detail, but can't see it happening in the near future.

I will leave the restore process running for about 24h to see results after that.

PS! I'm using exactly the same backup with exactly the same local database as the real problem happened - I left these untouched just in case I need to do something like this I'm doing right now.

jarmo commented 8 years ago

Ok, it has been restoring about 21 hours now and it's fully restored ~200GB. Calculating from that, I can get a rough estimate of about 4 days or slightly less.

I'd say this is much better for such a big backup :)

Although, there's still much room for improvement (e.g. if you compare it to just with copying that amount of data).

Still, big thank you to everyone involved :)

FootStark commented 8 years ago

@jarmo Thank you for sharing your test results. Can you ellaborate a little on the terms of your test? To be precise: What storage was involved for Source, Backup and Target (local disk, USB, Online, ...)? Was there source data still available (duplicati uses the original source files if they are accessible), already target data present, or was everything downloaded (restored from backup files)? There should be a (rather large) log file Duplicati.debug.log in your bin folder. Can you sanatize (remove filenames) and share it to reproduce where the time was used? And do you remember if a single thread of CPU was maxed out all of the time on duplicati's process? Also, you may want to consider running another test with the release build preview version, that would deliver a better measure for achievable speed.

jarmo commented 8 years ago

@FootStark I have already written about in great detail starting from first comment in this thread or even more important comment here https://github.com/duplicati/duplicati/issues/1391#issuecomment-148520207.

TLDR; I lost all data and had to perform full restore from local physical (SATA 2) hard drive to another (SATA 2) hard drives. In other words, no network was involved.

At first restore process was impossible, because rebuilding local database failed due to some checksum errors, but later I managed to get a local database backup so I could use --dbpath option, which resulted about 14 days of restoration process. I did not use PC at the time and it did not use a lot of resources according to resource monitor (only about 12% of total CPU, and no single thread/core was maxed out).

What do you mean that using release build preview version would deliver a better speed than using version from master? Do you mean that debugger will slow it down so much?

FootStark commented 8 years ago

I did read it, but I was not sure what your current setup is. Now that the data is back I did not assume you deleted it all again. So the following could apply: Assume Backup-Location is E:\BCK, Original data location was D:\Data. Last time (real desaster) you restored all your data to the original folders (D:\Data) where they were backupped from earlier. This time you tried to restore in an alternate folder (e.g. D:\Test), as the original folder already exists. In this case Duplicati would copy data between the original and restored files (D:\Data --> D:\Test), and not restore it from the backup in E:\BCK (except modified data). And that's a whole different area of code. Regarding the debugger: It slows down execution only moderately as long as no conditional breakpoints are used. But in debug mode and without optimization it could still be somewhere at only half speed to release. For a real world measurement on processing speed, it should not be used. The goal is to see what is really achievable right now. And there something like 10 - 20 MB/sec would be good (at least for how I would want to use it).

jarmo commented 8 years ago

Okay, to clarify - I know that Duplicati tries to use existing blocks already so I moved everything away from these folders so Duplicati would think that it needs to restore everything. That's how I got similar setup as losing really everything as happened before. I will give it a go with precompiled binary at some point too to see if there's much of a difference.

kenkendk commented 8 years ago

I doubt the debugging makes much difference. It has very low impact in my measurements. The SQLite version can have a huge impact.

The --no-local-blocks option disables reuse of local data, so combined with --no-local-db it should be the same as restore-from-scratch.

I know there is at least one potential optimization in the restore: restore blocks in zip order instead of db order. This will cause read-ahead and other things to work much better. I am sure there are other database optimizations as well.

FootStark commented 8 years ago

OK, I might take another look at the restore process on the weekend. Gotta build a suitable large backup for testing first. My current one is too small, and it contains a big chunk of executable data, which produces a heavy workload for malware protection. Which leads to another question: maybe restore files with a work suffix (e.g. .restoring) and rename all in the end?

kenkendk commented 8 years ago

@FootStark Would you restore to a temp folder and then move in, or just the individual files?

There are pros and cons to using a temp folder. Pros:

No interference to programs using the files
No chance for files being accessed in partial state
No programs will update the files during the restore

Cons:

Extra time spent for moving
Extra space used to store files (if originals exist)
Needs to copy data from existing files (if they exist)
Needs to re-apply metadata after moving (to preserve timestamps)

But if it solves in issue for you, I would be happy to include it.

duplicati / duplicati

Restoring without local database takes ages #1391