cjnaz / rclonesync-V2

A Bidirectional Cloud Sync Utility using rclone
MIT License
356 stars 39 forks source link

Exception in load_list due to out of memory for huge number of files #59

Closed cjnaz closed 4 years ago

cjnaz commented 4 years ago

Issue moved from #8.

@Jayashree-nach, thanks for uploading your debug files. Your LSL file lists have 1,960,936 files. It takes about 30 sec for load_list to load one lsl into memory and occupies just under 1GB or RAM on my Linux box, fyi. About 40 sec on my Window's box.

I've finding a disturbing problem in your Path2 LSL file. You noted that the file is 24 lines longer than the Path1 file. I find that all of these 24 lines are re-listings of files, some with different datetime stamps and file sizes, and for one file it is listed five times with different date stamps and sizes:

Your Path2 (remote:eMR-Testing) lsl file:
2020-09-01 15:48:49,214:  Found duplicate line in LSL file:  <    44288 2020-07-27 17:05:43.329000000 OP/-OP-000.jpeg>
2020-09-01 15:48:49,214:  First found:                       <   254331 2020-08-19 10:58:25.996000    OP/-OP-000.jpeg>
2020-09-01 15:48:49,214:  Deltas:                                210043 bytes       1965162.667000 sec
2020-09-01 15:48:49,214:  Found duplicate line in LSL file:  <    44288 2020-07-27 17:05:07.634000000 OP/-OP-000.jpeg>
2020-09-01 15:48:49,214:  First found:                       <   254331 2020-08-19 10:58:25.996000    OP/-OP-000.jpeg>
2020-09-01 15:48:49,214:  Deltas:                                210043 bytes       1965198.362000 sec
2020-09-01 15:48:49,214:  Found duplicate line in LSL file:  <   193826 2020-07-27 17:03:46.784000000 OP/-OP-000.jpeg>
2020-09-01 15:48:49,214:  First found:                       <   254331 2020-08-19 10:58:25.996000    OP/-OP-000.jpeg>
2020-09-01 15:48:49,214:  Deltas:                                 60505 bytes       1965279.212000 sec
2020-09-01 15:48:49,214:  Found duplicate line in LSL file:  <   193826 2020-07-27 17:00:13.550000000 OP/-OP-000.jpeg>
2020-09-01 15:48:49,214:  First found:                       <   254331 2020-08-19 10:58:25.996000    OP/-OP-000.jpeg>
2020-09-01 15:48:49,214:  Deltas:                                 60505 bytes       1965492.446000 sec
2020-09-01 15:48:49,267:  Found duplicate line in LSL file:  <   154195 2020-07-22 10:41:55.258000000 22072020/OP22072020/22072020-OP-001.jpeg>
Path1 only shows the 2020-08-19 version, fyi.

One file with a small time difference:
2020-09-01 15:48:49,267:  Found duplicate line in LSL file:  <   154195 2020-07-22 10:41:55.258000000 22072020/OP22072020/22072020-OP-001.jpeg>
2020-09-01 15:48:49,267:  First found:                       <    36103 2020-07-22 10:42:45.102000    22072020/OP22072020/22072020-OP-001.jpeg>
2020-09-01 15:48:49,267:  Deltas:                               -118092 bytes            49.844000 sec

And 19 files like...
2020-09-01 15:48:57,910:  Found duplicate line in LSL file:  <   245547 2017-08-12 15:26:20.293000000 056199/OP056199/056199-OP-013.jpeg>
2020-09-01 15:48:57,910:  First found:                       <   245547 2017-08-12 03:26:20.293000    056199/OP056199/056199-OP-013.jpeg>
2020-09-01 15:48:57,910:  Deltas:                                     0 bytes             0.000000 sec

When you do rclone lsl remote:eMR-Testing\OP\-OP-000.jpeg what is returned? Is it repeatable/consistent?

What remote service are you using for remote: on Path2? Is there an rclone switch for that remote for not listing older versions? This may be an rclone bug.

In the --first-sync snip you posted I see that 064195/OP064195/064195-OP-019.jpeg was deleted by the rclone sync (last step in rclonesync --first-sync), which would happen if that file were created after the initial Path2 lsl was captured. Is it correct that files are actively being changed on Path2 at the same time that you are running rclonesyncs? Such deletes should only happen during a first-sync, fyi. For non-first-sync runs, a new file after the initial lsl would be noticed on the next sync run.

Also, to test memory, I loaded your Path1 and Path2 LSL files 6 times. Memory usage on my Linux box went from 5.3GB to 11.1GB, with no errors/crashes. (The box has 16 GB RAM.) The current rclonesync algorithm holds four copies of the sync files in memory at once. I will optimize this to only two copies in the next release, but even six loaded copies doesn't seem to kill it on my boxes. My 16GB 64b Windows 10 box went from 12.2GB up to 15.3GB and did some swap to disk.

How much memory do you have on your failing Windows box, and what Windows version and 32b/64b?

In your non-first-sync-exception snip I assume you added a print of the out var to see how far into the Path2 file it got before the exception. I see this line at 1.37 million lines in. I have no clue why no exception info is printed.

I plan to post a V3.1 with a couple main changes

Jayashree-nach commented 4 years ago

Q: How much memory do you have on your failing Windows box, and what Windows version and 32b/64b?

My windows configuration : Windows Server 2012 - 12.0 GB (RAM) - 64b

On Wed, Sep 2, 2020 at 6:07 AM Chris notifications@github.com wrote:

Issue moved from #8 https://github.com/cjnaz/rclonesync-V2/issues/8.

@Jayashree-nach https://github.com/Jayashree-nach, thanks for uploading your debug files. Your LSL file lists have 1,960,936 files. It takes about 30 sec for load_list to load one lsl into memory and occupies just under 1GB or RAM on my Linux box, fyi. About 40 sec on my Window's box.

I've finding a disturbing problem in your Path2 LSL file. You noted that the file is 24 lines longer than the Path1 file. I find that all of these 24 lines are re-listings of files, some with different datetime stamps and file sizes, and for one file it is listed five times with different date stamps and sizes:

Your Path2 (remote:eMR-Testing) lsl file: 2020-09-01 15:48:49,214: Found duplicate line in LSL file: < 44288 2020-07-27 17:05:43.329000000 OP/-OP-000.jpeg> 2020-09-01 15:48:49,214: First found: < 254331 2020-08-19 10:58:25.996000 OP/-OP-000.jpeg> 2020-09-01 15:48:49,214: Deltas: 210043 bytes 1965162.667000 sec 2020-09-01 15:48:49,214: Found duplicate line in LSL file: < 44288 2020-07-27 17:05:07.634000000 OP/-OP-000.jpeg> 2020-09-01 15:48:49,214: First found: < 254331 2020-08-19 10:58:25.996000 OP/-OP-000.jpeg> 2020-09-01 15:48:49,214: Deltas: 210043 bytes 1965198.362000 sec 2020-09-01 15:48:49,214: Found duplicate line in LSL file: < 193826 2020-07-27 17:03:46.784000000 OP/-OP-000.jpeg> 2020-09-01 15:48:49,214: First found: < 254331 2020-08-19 10:58:25.996000 OP/-OP-000.jpeg> 2020-09-01 15:48:49,214: Deltas: 60505 bytes 1965279.212000 sec 2020-09-01 15:48:49,214: Found duplicate line in LSL file: < 193826 2020-07-27 17:00:13.550000000 OP/-OP-000.jpeg> 2020-09-01 15:48:49,214: First found: < 254331 2020-08-19 10:58:25.996000 OP/-OP-000.jpeg> 2020-09-01 15:48:49,214: Deltas: 60505 bytes 1965492.446000 sec 2020-09-01 15:48:49,267: Found duplicate line in LSL file: < 154195 2020-07-22 10:41:55.258000000 22072020/OP22072020/22072020-OP-001.jpeg> Path1 only shows the 2020-08-19 version, fyi.

One file with a small time difference: 2020-09-01 15:48:49,267: Found duplicate line in LSL file: < 154195 2020-07-22 10:41:55.258000000 22072020/OP22072020/22072020-OP-001.jpeg> 2020-09-01 15:48:49,267: First found: < 36103 2020-07-22 10:42:45.102000 22072020/OP22072020/22072020-OP-001.jpeg> 2020-09-01 15:48:49,267: Deltas: -118092 bytes 49.844000 sec

And 19 files like... 2020-09-01 15:48:57,910: Found duplicate line in LSL file: < 245547 2017-08-12 15:26:20.293000000 056199/OP056199/056199-OP-013.jpeg> 2020-09-01 15:48:57,910: First found: < 245547 2017-08-12 03:26:20.293000 056199/OP056199/056199-OP-013.jpeg> 2020-09-01 15:48:57,910: Deltas: 0 bytes 0.000000 sec

When you do rclone lsl remote:eMR-Testing\OP-OP-000.jpeg what is returned? Is it repeatable/consistent?

What remote service are you using for remote: on Path2? Is there an rclone switch for that remote for not listing older versions? This may be an rclone bug.

In the --first-sync snip you posted I see that 064195/OP064195/064195-OP-019.jpeg was deleted by the rclone sync (last step in rclonesync --first-sync), which would happen if that file were created after the initial Path2 lsl was captured. Is it correct that files are actively being changed on Path2 at the same time that you are running rclonesyncs? Such deletes should only happen during a first-sync, fyi. For non-first-sync runs, a new file after the initial lsl would be noticed on the next sync run.

Also, to test memory, I loaded your Path1 and Path2 LSL files 6 times. Memory usage on my Linux box went from 5.3GB to 11.1GB, with no errors/crashes. (The box has 16 GB RAM.) The current rclonesync algorithm holds four copies of the sync files in memory at once. I will optimize this to only two copies in the next release, but even six loaded copies doesn't seem to kill it on my boxes. My 16GB 64b Windows 10 box went from 12.2GB up to 15.3GB and did some swap to disk.

How much memory do you have on your failing Windows box, and what Windows version and 32b/64b?

In your non-first-sync-exception snip I assume you added a print of the out var to see how far into the Path2 file it got before the exception. I see this line at 1.37 million lines in. I have no clue why no exception info is printed.

I plan to post a V3.1 with a couple main changes

  • Load only 2 lsl lists at a time in non-first-sync mode, down from 4 (and the same as for --first-sync).
  • Improve the load_list logging for duplicates in the file (above example) and the line info for an exception.
  • In the case of duplicate files in the lsl, have load_list keep only the latest version. (Currently it keeps only the last occurrence in the lsl file.) However, I think that finding duplicates in the lsl is a dangerous situation that I want to better understand.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cjnaz/rclonesync-V2/issues/59, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQJJNT44SG3LCT2ZYSMZFJDSDWHWFANCNFSM4QSM7Y2A .

cjnaz commented 4 years ago

And for the other questions?

cjnaz commented 4 years ago

Please give tag version 3.1 a go... https://github.com/cjnaz/rclonesync-V2/tree/V3.1. I've reduced the number of concurrently loaded lsl files to two, same as a --first-sync.

I need your help with the duplicate file entries in your Path2 remote. Please reply to the above remaining questions.

cjnaz commented 4 years ago

@Jayashree-nach, I want to add some benchmark/performance data to the README. I now know that for your 1.96 million files it takes about 30 sec to load the lsl file into rclonesync, and it takes almost 1 GB of RAM (with your relatively short path names).

Would you please gather a few bits of data for me? This data indicates the bare minimum processing time for rclonesync in your setup. Any file transfers would be in addition and dominated by your WAN download/upload speeds.

Assuming V3.1 works for you, would you please upload a non-first-sync run log with --verbose set to https://drive.google.com/drive/folders/1FuHvtoezlesiK4btn0Jr8yhi4VQQ1xOr?usp=sharing.

cjnaz commented 4 years ago

Closed since the submitter has not responded.

V3.1 cut down memory usage by 50% for large number of files cases.