dandi / dandisets

749 Dandisets, 813.7 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

something is odd with dropbox rclone remote #141

Closed yarikoptic closed 2 years ago

yarikoptic commented 2 years ago

may be ran out of space??? from today's email:

move sub-813701555/sub-813701555_ses-835479236_probe-837761708_ecephys.nwb ok                                                                                                        
move sub-817060743/sub-817060743_ses-839068429_probe-868929135_ecephys.nwb (remote currently unavailable or git-annex-remote-rclone failed to parse rclone output) failed            
move sub-800250054/sub-800250054_ses-821695405_probe-822645899_ecephys.nwb (remote currently unavailable or git-annex-remote-rclone failed to parse rclone output) failed            
move sub-817060743/sub-817060743_ses-839068429_probe-868929140_ecephys.nwb (remote currently unavailable or git-annex-remote-rclone failed to parse rclone output) failed            
move sub-803390283/sub-803390283_ses-831882777_probe-832810578_ecephys.nwb ok                                                                                                        
move sub-817060743/sub-817060743_ses-839068429_probe-841435557_ecephys.nwb (remote currently unavailable or git-annex-remote-rclone failed to parse rclone output) failed   

so some did transfer ok but not the others ... who knows what dataset that is (attn @jwodder -- should be shown)... figured 000022.

similar message happened in other logs today as well...

jwodder commented 2 years ago

@yarikoptic

attn @jwodder -- should be shown

There's an INFO-level log message "Moving assets for Dandiset %s to backup remote" that should be shown before the git-annex move output. Is it not included in the e-mail?

yarikoptic commented 2 years ago

I did fsck on one of those files and everything seems to be ok, move also didn't move anything, odd

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000022$ git annex fsck --from dandi-dandisets-dropbox --key SHA256E-s1900372495--44f06eea609701c7f7d5bba7ff3483c756883a3e10b2e1e64b78f67ceed521ec.nwb
fsck SHA256E-s1900372495--44f06eea609701c7f7d5bba7ff3483c756883a3e10b2e1e64b78f67ceed521ec.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 858.662 MBytes (900372495 Bytes)

(checksum...) ok
(recording state in git...)

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000022$ git annex move -J6 --all --to dandi-dandisets-dropbox   --not --in dandi-dandisets-dropbox  --and --in here
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000022$ 
yarikoptic commented 2 years ago

There's an INFO-level log message "Moving assets for Dandiset %s to backup remote" that should be shown before the git-annex move output. Is it not included in the e-mail?

nope

emails looks like this ```shell Date: Thu, 17 Mar 2022 16:55:22 -0400 From: Cron Daemon To: dandi@drogon.datalad.org Subject: Cron chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-cron' >> python -m tools.backups2datalad -l WARNING -J 5 --target /mnt/backup/dandi/dandisets update-from-backup --backup-remote dandi-dandisets-dropbox --gh-org dandisets -e '000108$' move sub-666414034/sub-666414034_ses-669047103_icephys.nwb ok move sub-663393400/sub-663393400_ses-666218401_icephys.nwb ok move sub-666414024/sub-666414024_ses-668264924_icephys.nwb ok move sub-666414066/sub-666414066_ses-668444294_icephys.nwb ok .... ```
jwodder commented 2 years ago

@yarikoptic Looks like PR #133 wasn't pulled in somehow.

yarikoptic commented 2 years ago

May be but I think it was pulled in, there is DEBUG entry for the populate dandi-dandisets-dropbox invocation. I bounced you an email from yesterday... since then there were no new email,

and there is some hanging there backup process ```shell 13545 root 20 0 48828 2696 2312 S 0.0 0.0 0:00.00 │ └─ /usr/sbin/CRON -f 13546 dandi 20 0 4280 704 640 S 0.0 0.0 0:00.00 │ └─ /bin/sh -c chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-cron 13547 dandi 20 0 30804 9404 4124 S 0.0 0.0 0:22.91 │ └─ /usr/bin/perl /usr/bin/chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c /mnt/backup/dandi/dandisets/tools/backup 13548 dandi 20 0 10080 760 672 S 0.0 0.0 0:00.00 │ └─ /usr/bin/flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron.lock bash -c /mnt/backup/dandi/dandisets/tools/backups2datalad-cron 13549 dandi 20 0 11256 2904 2584 S 0.0 0.0 0:00.00 │ └─ /bin/bash /mnt/backup/dandi/dandisets/tools/backups2datalad-cron 13580 dandi 20 0 11256 1684 1360 S 0.0 0.0 0:00.00 │ └─ /bin/bash /mnt/backup/dandi/dandisets/tools/backups2datalad-cron 26938 dandi 20 0 168M 64984 9660 S 0.0 0.1 0:02.03 │ └─ python -m tools.backups2datalad -l DEBUG -J 5 --target /mnt/backup/dandi/dandisets populate dandi-dandisets-dropbox 12992 dandi 20 0 1.0T 305M 37136 S 26.1 0.5 5h20:06 │ └─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 15517 dandi 20 0 26348 4848 4460 S 0.0 0.0 0:00.00 │ ├─ git --git-dir=.git --work-tree=. --literal-pathspecs -c annex.retry=3 cat-file --batch 13124 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:00.01 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13112 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:00.00 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13083 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:00.00 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13066 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:00.00 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13056 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:58.67 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13043 dandi 20 0 1.0T 305M 37136 S 5.9 0.5 1h12:46 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13031 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:58.09 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13030 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:27.20 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13029 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:00.00 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13028 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:20.92 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13027 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:41.01 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13026 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 2:25.02 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13025 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:33.06 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13024 dandi 20 0 1.0T 305M 37136 S 6.5 0.5 1h11:45 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13023 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:35.06 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13022 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:45.23 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13021 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:24.26 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13018 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:47.35 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13017 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:37.15 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13016 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:35.54 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13009 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:22.51 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13007 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:57.17 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13006 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:27.73 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13005 dandi 20 0 1.0T 305M 37136 S 6.5 0.5 1h12:31 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13004 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:34.31 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13003 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:25.39 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 13002 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:14.95 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 12997 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 1:10.04 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 12995 dandi 20 0 1.0T 305M 37136 S 5.9 0.5 1h12:10 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 12994 dandi 20 0 1.0T 305M 37136 S 0.0 0.5 0:57.07 │ ├─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here 12993 dandi 20 0 1.0T 305M 37136 S 1.3 0.5 1:49.25 │ └─ git-annex get -c annex.retry=3 --jobs 5 --from=web --not --in dandi-dandisets-dropbox --and --not --in here ```

without any obvious network activity, so I guess it is just stuck. git annex is 8.20211028-g1c76278 , and IIRC there were some fixes for parallel git annex get operation since then, but I am a bit reluctant to upgrade since most recent versions have their own issues... hm FWIW, running that annex find in 000026 gives only a single file

(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex find --not --in dandi-dandisets-dropbox --and --not --in here
derivatives/EPIC/sub-EXC022/sub-EXC022_ses-MRI_flip-4_VFA.nii.gz

so may be indeed somehow a race condition across this many get jobs, eh. I will interrupt that backup run, do manual get, and reincarnate backup run manually to see where we would get

yarikoptic commented 2 years ago

havent' seen anything like that recently so let's blissfully be happy and hope in our ignorance that the issue is not of a problem for us.