dandi / dandisets

730 Dandisets, 807.1 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

some files are not backed up (to dropbox) although script says there nothing to do #370

Closed yarikoptic closed 9 months ago

yarikoptic commented 9 months ago
> python -m tools.backups2datalad -l DEBUG --backup-root /mnt/backup/dandi --config tools/backups2datalad.cfg.yaml populate 000618
2023-11-14T11:32:36-0500 [INFO    ] backups2datalad: Saving logs to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.11.14.16.32.36Z.log
2023-11-14T11:32:36-0500 [DEBUG   ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 config --get dandi.populated [cwd=/mnt/backup/dandi/dandisets/000618]
2023-11-14T11:32:36-0500 [DEBUG   ] backups2datalad: Finished [rc=0]: git -c receive.autogc=0 -c gc.auto=0 config --get dandi.populated [cwd=/mnt/backup/dandi/dandisets/000618]
2023-11-14T11:32:36-0500 [DEBUG   ] backups2datalad: Running: git -c receive.autogc=0 -c gc.auto=0 show -s --format=%H [cwd=/mnt/backup/dandi/dandisets/000618]
2023-11-14T11:32:36-0500 [DEBUG   ] backups2datalad: Finished [rc=0]: git -c receive.autogc=0 -c gc.auto=0 show -s --format=%H [cwd=/mnt/backup/dandi/dandisets/000618]
2023-11-14T11:32:36-0500 [INFO    ] backups2datalad: Dandiset 000618: no need to populate
Logs saved to /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2023.11.14.16.32.36Z.log

although it is easy to see that some files were not pushed to the backup

dandi@drogon:/mnt/backup/dandi/dandisets/tools$ git -C ../000618 annex list | head -n 20
here
|github
||dandiapi
|||web
||||bittorrent
|||||dandi-dandisets-dropbox (untrusted)
||||||
__XX_x sub-hybrid-janelia/sub-hybrid-janelia_ses-hybrid-drift-siprobe-rec-16c-1200s-11_ecephys.nwb
__XX_x sub-hybrid-janelia/sub-hybrid-janelia_ses-hybrid-drift-siprobe-rec-16c-1200s-21_ecephys.nwb
__XX__ sub-hybrid-janelia/sub-hybrid-janelia_ses-hybrid-drift-siprobe-rec-16c-1200s-31_ecephys.nwb
__XX_x sub-hybrid-janelia/sub-hybrid-janelia_ses-hybrid-drift-siprobe-rec-16c-600s-11_ecephys.nwb
__XX_x sub-hybrid-janelia/sub-hybrid-janelia_ses-hybrid-drift-siprobe-rec-16c-600s-12_ecephys.nwb
__XX_x sub-hybrid-janelia/sub-hybrid-janelia_ses-hybrid-drift-siprobe-rec-16c-600s-21_ecephys.nwb

I didn't check inside how we decide on either worth doing that copy...

here is a sweep through all dandisets showing that we have quite a good number of assets which were not transferred + some odd unrelated error I am yet to check ```shell dandi@drogon:/mnt/backup/dandi/dandisets$ for ds in 000*; do n=$(git -C $ds annex find --in web --not --in dandi-dandisets-dropbox | grep -v '^|' | wc -l); [ "$n" == "0" ] || echo "$ds $n"; done 000026 6 000037 41 000059 49 000235 2 000238 4 000246 144 000363 159 000409 39 000410 1 000447 1 000458 3 000465 13 000467 508 000477 4 000488 3 000535 3 000540 7 000541 8 000544 2 000547 6 000549 2 000550 3 000552 9 000554 23 000559 11 000561 7 000568 28 000570 1 000571 3 000572 26 000574 17 000576 4 000579 13 000582 1 000618 32 error: object file .git/objects/b6/d1d20e316838110ae3385d8aeff437db79fe9f is empty fatal: b6d1d20e316838110ae3385d8aeff437db79fe9f is not a valid object git-annex: user error (git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","commit-tree","b6d1d20e316838110ae3385d8aeff437db79fe9f","--no-gpg-sign"] exited 128) 000624 14 000626 1 000628 31 000630 210 000631 2 000632 7 000635 4 000636 705 000637 12 000640 167 000674 3 000678 6 000687 5 000691 1 000692 9 000696 1 ```

@jwodder -- what do you think, on how we could have end up in such a situation that not everything is copied to dropbox, so we could mitigate with better understanding of the situation?

jwodder commented 9 months ago

@yarikoptic The "no need to populate" message is emitted if the value of the dandi.populated Git option for the dandiset equals the commit hash of its current HEAD. dandi.populated is updated after every successful run of git-annex copy, which happens when populate is run while dandi.populated is out of date.

Do the dandisets in question have uncommitted changes? That would be one way for there to be files in the dandiset that haven't been pushed to Dropbox. Other than that, I don't know how this would have happened (aside from blaming git-annex). One possible way to address this would be to give populate a --force option for skipping the dandi.populated check.

yarikoptic commented 9 months ago

Do the dandisets in question have uncommitted changes?

I don't think so or "not necessarily" as running this now shows that samples of those datasets are all nice and clean ```shell dandi@drogon:/mnt/backup/dandi/dandisets$ for ds in 000*; do n=$(git -C $ds annex find --in web --not --in dandi-dandisets-dropbox | grep -v '^|' | wc -l); [ "$n" == "0" ] || { echo "$ds $n"; git -C $ds status; } done 000026 6 On branch draft Your branch is up to date with 'github/draft'. It took 10.68 seconds to enumerate untracked files. 'status -uno' may speed it up, but you have to be careful not to forget to add new files yourself (see 'git help status'). nothing to commit, working tree clean 000037 41 On branch draft Your branch is up to date with 'github/draft'. nothing to commit, working tree clean 000059 49 Refresh index: 100% (106/106), done. On branch draft Your branch is up to date with 'github/draft'. nothing to commit, working tree clean ```

One possible way to address this would be to give populate a --force option for skipping the dandi.populated check.

yes. Let's do it, besides may be making it into an option not a flag. update-from-backup has --mode verify, may be let's add a similar --mode force-fast here?

But I personally failed to make it "fast" or copy at all (might just need to rest...): https://git-annex.branchable.com/bugs/copy_--fast_--from_--to_checks_destination_files/

NB: I have first tried to copy files in a sample of those dandisets: -- the 000409, which had 39 files not present in dropbox. And I got a flood of I guess not processed outputs from rclone queries and/or may be git-annex querying also on the files which it knows it already has in dropbox ... that lead me to that exploration/bug report against git-annex ```shell dandi@drogon:/mnt/backup/dandi/dandisets/000409$ git annex find --in web --not --in dandi-dandisets-dropbox | grep -v '^|' | wc -l 39 dandi@drogon:/mnt/backup/dandi/dandisets/000409$ git annex copy --fast --from web --to dandi-dandisets-dropbox copy sub-CSH-ZAD-001/sub-CSH-ZAD-001_ses-3e7ae7c0-fe8b-487c-9354-036236fa1010-processed-only_behavior.nwb Total objects: 1 Total size: 317.579 MBytes (333005531 Bytes) (from web...) (to dandi-dandisets-dropbox...) ok copy sub-CSH-ZAD-001/sub-CSH-ZAD-001_ses-3e7ae7c0-fe8b-487c-9354-036236fa1010_behavior+ecephys+image.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) ... MANY more of such lines ... Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 636.587 MBytes (667510033 Bytes) (from web...) (to dandi-dandisets-dropbox...) ok copy sub-CSH-ZAD-001/sub-CSH-ZAD-001_ses-3e7ae7c0-fe8b-487c-9354-036236fa1010_behavior+ecephys+image/sub-CSH-ZAD-001_ses-3e7ae7c0-fe8b-487c-9354-036236fa1010_OriginalVideoLeftCamera.mp4 Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 32.261 MBytes (33828440 Bytes) (from web...) (to dandi-dandisets-dropbox...) ok copy sub-CSH-ZAD-011/sub-CSH-ZAD-011_ses-5b44c40f-80f4-44fb-abfb-c7f19e27a6ca-processed-only_behavior.nwb Total objects: 1 Total size: 367.072 MBytes (384902813 Bytes) (from web...) (to dandi-dandisets-dropbox...) ok copy sub-CSH-ZAD-011/sub-CSH-ZAD-011_ses-5b44c40f-80f4-44fb-abfb-c7f19e27a6ca_behavior+ecephys+image.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) ... ```
and indeed -- testing on 000003 where all files are "known" to be present to dropbox already, doing `copy --fast` still queries them -- once per chunk (we chunk up in dropbox to not hit size limit). ```shell dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex copy --fast --from web --to dandi-dandisets-dropbox copy sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 662.531 MBytes (694713998 Bytes) (from web...) (to dandi-dandisets-dropbox...) ok copy sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140324_behavior+ecephys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) ^C real 0m15.048s user 0m0.095s sys 0m0.033s ```
jwodder commented 9 months ago

may be let's add a similar --mode force-fast here?

Are you saying you want this new mode to pass --fast to git-annex copy even though you can't get it to work right?

yarikoptic commented 9 months ago

ATM let's just wait for @joeyh follow up on that bug report since indeed I cannot make it work (do you see where I might have made a mistake?) -- I am even wondering if the reason why it doesn't work for me relates why we got some files not backed up...?

yarikoptic commented 9 months ago

ok, troubleshooted and filed a dedicated https://git-annex.branchable.com/bugs/copy_--from_--to_does_not_copy_if_present_locally/

and here is a current situation which aligns with that observation -- most (not all agree but I didn't try to run that `copy --fast` in all) likely we indeed had some keys already present for one reason or another, as shown in this output, and thus those keys were not transferred ```shell (git-annex) dandi@drogon:/mnt/backup/dandi/dandisets$ for ds in 000*; do n=$(git -C $ds annex find --in web --not --in dandi-dandisets-dropbox | grep -v '^|' | wc -l); [ "$n" == "0" ] || { n2=$(git -C $ds annex find --in here --not --in dandi-dandisets-dropbox | grep -v '^|' | wc -l); echo "$ds $n but here $n2 "; } done 000026 6 but here 6 000037 41 but here 41 000059 49 but here 49 000238 4 but here 4 000246 144 but here 110 000363 159 but here 135 000409 39 but here 39 000410 1 but here 1 000447 1 but here 1 000458 3 but here 3 000465 13 but here 13 000467 508 but here 503 000477 4 but here 4 000488 3 but here 3 000535 3 but here 3 000540 7 but here 0 000541 8 but here 8 000544 2 but here 0 000547 6 but here 6 000549 2 but here 2 000550 3 but here 3 000552 9 but here 9 000554 23 but here 0 000559 11 but here 11 000561 7 but here 7 000568 28 but here 28 000570 1 but here 1 000571 3 but here 3 000572 26 but here 26 000574 17 but here 17 000576 4 but here 4 000579 13 but here 13 000582 1 but here 1 000618 32 but here 32 000624 14 but here 14 000626 1 but here 0 000628 31 but here 31 000630 210 but here 0 000631 2 but here 2 000632 7 but here 7 000635 4 but here 4 000636 705 but here 0 000637 12 but here 12 000640 167 but here 11 000674 3 but here 3 000678 6 but here 6 000687 5 but here 0 000691 1 but here 1 000692 9 but here 0 000696 1 but here 0 ```

I think for a workaround:

jwodder commented 9 months ago

@yarikoptic Should the two copy commands be run every time populate and populate-zarrs are run, or only when --mode force-fast is given?

yarikoptic commented 9 months ago

Should the two copy commands be run every time populate and populate-zarrs are run

no -- they should be run according to the logic you provided above. i.e. not every time but depending on dandi.populated .

or only when --mode force-fast is given?

I think this mode should just pretty much disable the dandi.populated check and proceed to triggering that copy tandem.

jwodder commented 9 months ago

@yarikoptic

yarikoptic commented 9 months ago
  • If the current git-annex copy command is replaced by two copy's, both invoked with --fast, and if the only thing --mode force-fast does is skip the dandi.populated check, why name it "force-fast" instead of just "force"?

we might later get "force" which would do the same without --fast option added -- it would be slower since would be more thoroughly checking presence on the destination. But I hope we would avoid needing it.

  • I am very much not a fan of CLI options that only accept one specific required string as an argument; it would make much more sense for --mode force[-fast] to be --force[-fast] instead. If you're worried about making a breaking change to the CLI later if & when more modes are added, those can be done via feature switches.

I guess could be indeed. I would not fight heavily here over it, especially since ATM this is all internal functionality and thus we can easily break interfaces, especially the ones for "emergency use" later on. So feel welcome proceed the way you feel better of. But in my opinion:

ok, troubleshooted and filed a dedicated https://git-annex.branchable.com/bugs/copy_--from_--to_does_not_copy_if_present_locally/

fixed in 10.20230926-142-g6e3bcbf4dd .

yarikoptic commented 9 months ago

FTR, did

for d in 00*; do echo $d; git -C $d annex move -J4 --fast --to dandi-dandisets-dropbox; done

to move what is local to dropbox first, and eventually transfers started to stall. I think the first one was:

000363
(recording state in git...)
move sub-440956/sub-440956_ses-20190207T120657_behavior+ecephys+image+ogen.nwb
move sub-440956/sub-440956_ses-20190208T133600_behavior+ecephys+ogen.nwb ok    (to dandi-dandisets-dropbox...)
move sub-440957/sub-440957_ses-20190212T153751_behavior+ecephys+image+ogen.nwb (to dandi-dandisets-dropbox...)
move sub-440957/sub-440957_ses-20190211T143614_behavior+ecephys+image+ogen.nwb (to dandi-dandisets-dropbox...)
move sub-440957/sub-440957_ses-20190214T144611_behavior+ecephys+image+ogen.nwb (to dandi-dandisets-dropbox...)
move sub-440956/sub-440956_ses-20190209T150135_behavior+ecephys+image+ogen.nwb
2023/11/18 08:43:07 ERROR : : error listing: directory not found 2023/11/18 08:43:07 Failed to size with 2 errors: last error was: directory not found
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
0%    953.67 MiB      618 MiB/s 11m12s
2023/11/18 08:73:08 ERROR7: : error l14T144611irectory not found 2023/11/18 08:(to dandi-dandisets-dropbox...) rs: last error was: directory not found
1%    1.86 GiB        472 MiB/s 9m47s
Total objects:61 Total si6e: 953.674 08T133600000000000 Bytes)  ogen.nwb ok
Total objects: 1 Total size: 953.674 M2T153751000000000 Bytes)
0%    1.86 GiB        417 MiB/s 16m8s
move sub-440957/sub-440957_ses-20190211T143614_behavior+ecephys+image+ogen.nwb (to dandi-dandisets-dropbox...)
0%    953.67 MiB      618 MiB/s 11m12s
move sub-440957/sub-440957_ses-20190214T144611_behavior+ecephys+image+ogen.nwb (to dandi-dandisets-dropbox...)
move sub-440956/sub-440956_ses-20190209T150135_behavior+ecephys+image+ogen.nwb (to dandi-dandisets-dropbox...)
  Transfer stalled
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
2023/11/18 08:43:14 ERROR : : error listing: directory not found 2023/11/18 08:43:14 Failed to size with 2 errors: last error was: directory not found

move sub-440957/sub-440957_ses-20190211T143614_behavior+ecephys+image+ogen.nwb (to dandi-dandisets-dropbox...)
  Transfer stalled

  Transfer stalled
failed
move sub-440956/sub-440956_ses-20190207T120657_behavior+ecephys+image+ogen.nwb (to dandi-dandisets-dropbox...)
  Transfer stalled

I interrupted for now... need to troubleshoot (may be finally unlimited dropbox became limited?)

yarikoptic commented 9 months ago

FTR: that box is experiencing hard time across all layers... I am running btrfs filesystem resize 1:-30T /mnt/backup/ to start getting rid of old drives in LVM and composing at BTRFS level instead of LVM, but there are some reports of corruption errors etc... inquiring on BTRFS IRC. It also does healthcheck etc -- I will stop/disable that for now.

So may be those above also contribute to a little unstable operation above.

Also tagged/released new git-annex-remote-rclone and testing it (will install new debian pkg when in neurodebian) -- works nicely and does not make noise as much. On one sample - it did complain about stalled transfer but then ended up with ok:

dandi@drogon:/mnt/backup/dandi/dandisets/000477$ PATH=/home/dandi/proj/git-annex-remote-rclone:$PATH  git annex move -J4  --fast --to dandi-dandisets-dropbox
move sub-SKKS092/sub-SKKS092_ses-20210304T122622_behavior+ophys.nwb (to dandi-dandisets-dropbox...) ok
move sub-SKKS092/sub-SKKS092_ses-20210217T090900_behavior+ophys.nwb (to dandi-dandisets-dropbox...) ok
move sub-SKKS091/sub-SKKS091_ses-20210518T174412_behavior+ophys.nwb (to dandi-dandisets-dropbox...) ok
move sub-SKKS097/sub-SKKS097_ses-20210217T120353_behavior+ophys.nwb (to dandi-dandisets-dropbox...)
  Transfer stalled
ok
(recording state in git...)