Closed maxtimbo closed 3 years ago
Note that I suspect that the root cause for the duplicates is an rclone or Drive issue/bug. I cannot reproduce it on my Drive account, so I need your help.
I'm submitting an issue at rclone. Please provide some supporting data:
If the LSL file truly does have file revisions, then using the most recent seems to be the correct behavior for rclonesync. You can eliminate all the duplicate warnings by commenting out the logging.warning
lines 649 and 651 in rclonesync V3.2 (for now).
I see that you are using a filters file. Please try rclone lsl Prod:/ > <somefile>
with and without --filter-from /root/rclonesync/Filters
, and check for the duplicate listed files.
This will help isolate the problem by seeing if filtering has any affect.
@maxtimbo - ping
Note that I suspect that the root cause for the duplicates is an rclone or Drive issue/bug. I cannot reproduce it on my Drive account, so I need your help.
I'm submitting an issue at rclone. Please provide some supporting data:
* What rclone version are you running?
rclone v1.54.0
- os/arch: linux/amd64
go version: go1.15.7
What rclonesync version are you running? rclonesync V3.2 201201
Confirm that you are using Google Drive. It is google drive
Please provide a grep of the Drive LSL file for one of the duplicates filenames.
Overall, How many file duplicates are we talking about? Can I send you an example output? If I had to guess, I'd say in the triple digits, maybe 100-115 or so. But it may be inflated since every duplicate file prints twice in the log.
For these files, do you have any insight on how the duplicate versions came to be? When I first set this up, since it was very very large share, I manually copied everything into the g-drive. I surmise this might be the cause of these dups.
Please post a the log output from rclonesync with --verbose. Feel free to edit out a bunch of redundant duplicate file lines, and sanitize it. If you would rather share offline, please post to your Gdrive and send me a link. I'll do this and send you the output.
If the LSL file truly does have file revisions, then using the most recent seems to be the correct behavior for rclonesync. You can eliminate all the duplicate warnings by commenting out the
logging.warning
lines 649 and 651 in rclonesync V3.2 (for now).I see that you are using a filters file. Please try rclone lsl Prod:/ >
with and without --filter-from /root/rclonesync/Filters, and check for the duplicate listed files.
I'll do this with the --verbose tag...
Please do:
rclone lsl Prod:/ | grep one_of_the_duplicate_filenames > no_filter.txt
rclone lsl Prod:/ --filter-from /root/rclonesync/Filters | grep one_of_the_duplicate_filenames > with_filter.txt
If the dup shows up with our without the filter then the filter is not related to the problem. Just need to confirm.
Please post the problem .txt file here, or upload to drive and share it with me at github@cjnaz.com.
There is a long history with duplicates on Drive, but I've not seen discussion of them showing up in LSLs.
rclone lsl Prod:/ | grep "mystique new.mp3" > no_filter.txt
root@redacted:~/rclonesync-v2# cat ../no_filter.txt 997317 2018-07-26 17:35:42.806000000 redacted/redacted/Spots/mystique new.mp3 997317 2018-07-26 17:23:42.476000000 redacted/redacted/Spots/mystique new.mp3
rclone lsl Prod:/ --filter-from /root/rclonesync/Filters | grep "mystique new.mp3" > with_filter.txt
root@redacted:~/rclonesync-v2# cat ../with_filter.txt 997317 2018-07-26 17:35:42.806000000 redacted/redacted/Spots/mystique new.mp3 997317 2018-07-26 17:23:42.476000000 redacted/redacted/Spots/mystique new.mp3
I started doing some tests. I have to run with --exclude
to avoid some subdirectories as they contain some redundant/very large files. These are defined in the Filters file. In fact, that's pretty much the only thing defined in the Filters file, sub-dirs to avoid.
rclone.org posted issue... https://forum.rclone.org/t/drive-duplicate-files-reported-by-rclone-lsl/23045
So it seems that the Google Drive duplicates are not important files and can be purged. You should be able to see them on the Drive web interface. You can manually delete them via the web interface, or consider using the rclone dedupe command.
I think its best for the user to clean up the rubble on Drive rather than taking out the warning messages from rclonesync.
I run rclonesync every few days via crontab. The output is sent to me via email. I don't think these warning are breaking anything, but I get like a novella's worth of the same warning: Duplicate line in LSL file, Prior found (keeping latest). Is there a way to cull these warnings without turning them off?
Here's the complete command I use in cron:
/root/rclonesync-v2/rclonesync -f /root/rclonesync/Filters /home/Programming Prod:/ --rclone-args --drive-skip-gdocs