Closed yarikoptic closed 1 year ago
FTR here is the status among recently modified zarrs
(dandisets) dandi@drogon:/mnt/backup/dandi/dandizarrs$ head -n 13 /tmp/zarrs.list | grep -e '-.*-.*-.*-' | awk '{print $9}' | while read d; do echo -n "$d: "; git -C $d status | grep -q 'working tree clean' && echo clean || echo dirty; done
837ad88c-3397-4515-9a64-b247b5dafb11: dirty
ce3199b6-25bc-42ff-9771-9074a75fa693: clean
ea8c43c7-757e-4653-8e4a-a6d356120836: clean
7a7fbb58-22a2-4105-a1f1-eff6fbc89164: clean
25329906-de5c-4cf5-9ddd-f43c528704c0: clean
ad50ab40-2346-4528-b077-41eedf00c090: dirty
b8b833f4-fbb0-4759-a46f-941695cd137c: clean
fb035cb4-6bf2-4072-8241-f9d8dbe92d72: clean
68e56208-a538-4c40-9382-d7e549ff0070: clean
80404d11-0e71-4200-b352-7cbf1d6fbc3f: clean
didn't spot any hint on possible cause yet in the logs, will proceed with reset etc on those not clean
grr -- it did crash again even with --workers 2
with the "beginning of the end" looking to me as
2022-11-03T04:55:02-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000108/versions/draft/assets/9f4330bb-2b7f-4995-bde3-855390a3bf4b/info/ in 0.972675 seconds as it raised ConnectTimeout:
2022-11-03T04:55:27-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000108/versions/draft/assets/9f4330bb-2b7f-4995-bde3-855390a3bf4b/info/ in 2.055273 seconds as it raised ConnectTimeout:
2022-11-03T04:55:37-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000108/versions/draft/assets/9f4330bb-2b7f-4995-bde3-855390a3bf4b/info/ in 4.055083 seconds as it raised ConnectTimeout:
2022-11-03T04:55:59-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000108/versions/draft/assets/9f4330bb-2b7f-4995-bde3-855390a3bf4b/info/ in 7.642359 seconds as it raised ConnectTimeout:
2022-11-03T04:56:25-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000108/versions/draft/assets/9f4330bb-2b7f-4995-bde3-855390a3bf4b/info/ in 16.100357 seconds as it raised ConnectTimeout:
2022-11-03T04:57:09-0400 [ERROR ] asyncio: Exception in callback SubprocessStreamProtocol.pipe_data_received(1, b'204\x000/0/.../98\x000/0/0/')
handle: <Handle SubprocessStreamProtocol.pipe_data_received(1, b'204\x000/0/.../98\x000/0/0/')>
Traceback (most recent call last):
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/subprocess.py", line 73, in pipe_data_received
reader.feed_data(data)
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/streams.py", line 472, in feed_data
assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof
the log is /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.02.16.24.18Z.log . @jwodder -- please have another fresh look and try to figure out what could be the problem here, either we are not swallowing some exception and either we are logging all stdout/stderr which might be relevant.
having merged #294 and having git reset/clean recently modified zarr datalad datasets, I have restarted with --workers 6
to hopefully get better idea of the effect of #294 ;-)
process crashed,
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.03.19.21.15Z.log
(venv) (base) dandi@drogon:/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad$ grep 'Exception raised' 2022.11.03.*
(venv) (base) dandi@drogon:/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad$
@jwodder does it mean that https://github.com/agronholm/anyio/issues/490 is not our underlying issue?
git-annex find
processes - although some are from some old times -- apparently when dying off they are not properly fully killed. For now I will kill them manually, but could anything be done to ensure they are killed by our script @jwodder ?
(venv) (base) dandi@drogon:/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad$ ps auxw -H | grep -A3 git-anne[x]
dandi 3086865 0.0 0.0 1074053024 2072 pts/8 Sl Oct26 0:00 git-annex find --include=* --json
dandi 3086872 0.0 0.0 21876 1656 pts/8 S Oct26 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3086873 0.0 0.0 50496 516 pts/8 S Oct26 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3086874 0.0 0.0 50496 544 pts/8 S Oct26 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3086887 0.0 0.0 1074053024 2072 pts/8 Sl Oct26 0:00 git-annex find --include=* --json
dandi 3086894 0.0 0.0 21876 1780 pts/8 S Oct26 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3086895 0.0 0.0 50512 1096 pts/8 S Oct26 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3086896 0.0 0.0 50512 1144 pts/8 S Oct26 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 2338941 0.0 0.0 1074053024 2408 pts/8 Sl Oct31 0:05 git-annex find --include=* --json
dandi 2338948 0.0 0.0 21052 1840 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 2338949 0.0 0.0 45388 1624 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 2338950 0.0 0.0 45388 1700 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 2339079 0.0 0.0 1074053024 2408 pts/8 Sl Oct31 0:05 git-annex find --include=* --json
dandi 2339086 0.0 0.0 20368 1900 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 2339087 0.0 0.0 44612 1392 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 2339088 0.0 0.0 44612 1308 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 2339578 0.0 0.0 1074053024 2360 pts/8 Sl Oct31 0:05 git-annex find --include=* --json
dandi 2339585 0.0 0.0 20368 1892 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 2339586 0.0 0.0 44612 1376 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 2339587 0.0 0.0 44612 1456 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 2344705 0.0 0.0 1074053024 2348 pts/8 Sl Oct31 0:00 git-annex find --include=* --json
dandi 2344712 0.0 0.0 19164 1776 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 2344713 0.0 0.0 39896 8 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 2344714 0.0 0.0 39896 8 pts/8 S Oct31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1003159 0.0 0.0 1074053024 2984 pts/8 Sl Nov02 0:11 git-annex find --include=* --json
dandi 1003166 0.0 0.0 24976 11192 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1003167 0.0 0.0 43496 1060 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1003168 0.0 0.0 43496 1060 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1004853 0.0 0.0 1074053024 5424 pts/8 Sl Nov02 0:09 git-annex find --include=* --json
dandi 1004860 0.0 0.0 24976 4616 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1004861 0.0 0.0 43508 8 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1004862 0.0 0.0 43508 8 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1005122 0.0 0.0 1074053024 4488 pts/8 Sl Nov02 0:08 git-annex find --include=* --json
dandi 1005129 0.0 0.0 24976 7040 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1005130 0.0 0.0 52232 12 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1005131 0.0 0.0 52232 8 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1005596 0.0 0.0 1074053024 6224 pts/8 Sl Nov02 0:08 git-annex find --include=* --json
dandi 1005603 0.0 0.0 24976 8576 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1005604 0.0 0.0 56708 1724 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1005605 0.0 0.0 56708 1824 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1012017 0.0 0.0 1074053024 2332 pts/8 Sl Nov02 0:07 git-annex find --include=* --json
dandi 1012044 0.0 0.0 24976 1812 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1012046 0.0 0.0 58948 1872 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1012048 0.0 0.0 58948 8 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1023752 0.0 0.0 1074053024 2460 pts/8 Sl Nov02 0:05 git-annex find --include=* --json
dandi 1023759 0.0 0.0 24976 1872 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1023760 0.0 0.0 60564 8 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1023761 0.0 0.0 60564 8 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1024182 0.0 0.0 1074053024 2136 pts/8 Sl Nov02 0:05 git-annex find --include=* --json
dandi 1024189 0.0 0.0 24976 1928 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1024190 0.0 0.0 59532 32 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1024191 0.0 0.0 59532 228 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1025269 0.0 0.0 1074053024 2748 pts/8 Sl Nov02 0:05 git-annex find --include=* --json
dandi 1025280 0.0 0.0 24976 1876 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1025281 0.0 0.0 55924 792 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1025282 0.0 0.0 55924 924 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1025721 0.0 0.0 1074053024 4820 pts/8 Sl Nov02 0:05 git-annex find --include=* --json
dandi 1025728 0.0 0.0 24976 10372 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1025729 0.0 0.0 53088 1220 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1025731 0.0 0.0 53088 1392 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1026364 0.0 0.0 1074053024 2348 pts/8 Sl Nov02 0:04 git-annex find --include=* --json
dandi 1026371 0.0 0.0 24976 10696 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1026372 0.0 0.0 45864 52 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1026373 0.0 0.0 45864 8 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1031092 0.0 0.0 1074053024 8156 pts/8 Sl Nov02 0:01 git-annex find --include=* --json
dandi 1031099 0.0 0.0 24480 3940 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1031100 0.0 0.0 46304 1564 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1031101 0.0 0.0 46304 1668 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1032723 0.0 0.0 1074053024 2304 pts/8 Sl Nov02 0:00 git-annex find --include=* --json
dandi 1032730 0.0 0.0 24480 3460 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 1032731 0.0 0.0 52248 992 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 1032732 0.0 0.0 52248 1244 pts/8 S Nov02 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 4515 0.0 0.0 1074053024 16248 pts/8 Sl Nov03 0:12 git-annex find --include=* --json
dandi 4522 0.0 0.0 0 0 pts/8 Z Nov03 0:00 [git] <defunct>
dandi 4523 0.0 0.0 52232 1176 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 4524 0.0 0.0 52232 1796 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 7620 0.0 0.0 1074053024 14392 pts/8 Sl Nov03 0:08 git-annex find --include=* --json
dandi 7627 0.0 0.0 24976 15964 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 7628 0.0 0.0 59532 1376 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 7629 0.0 0.0 59532 268 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26207 0.0 0.0 1074053024 13056 pts/8 Sl Nov03 0:04 git-annex find --include=* --json
dandi 26214 0.0 0.0 24976 15720 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 26215 0.0 0.0 45864 756 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26216 0.0 0.0 45864 900 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26217 0.0 0.0 1074053024 12024 pts/8 Sl Nov03 0:04 git-annex find --include=* --json
dandi 26224 0.0 0.0 24976 16076 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 26225 0.0 0.0 53088 284 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26226 0.0 0.0 53088 300 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26227 0.0 0.0 1074053024 12464 pts/8 Sl Nov03 0:04 git-annex find --include=* --json
dandi 26234 0.0 0.0 24976 15992 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 26235 0.0 0.0 55924 2020 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26236 0.0 0.0 55924 2172 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26432 0.0 0.0 1074053024 13776 pts/8 Sl Nov03 0:03 git-annex find --include=* --json
dandi 26439 0.0 0.0 24480 15288 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 26440 0.0 0.0 46304 700 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26441 0.0 0.0 46304 644 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26618 0.0 0.0 1074053024 11736 pts/8 Sl Nov03 0:03 git-annex find --include=* --json
dandi 26625 0.0 0.0 24480 15304 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 26626 0.0 0.0 52248 284 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 26627 0.0 0.0 52248 300 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 27041 0.0 0.0 1074053024 11508 pts/8 Sl Nov03 0:02 git-annex find --include=* --json
dandi 27048 0.0 0.0 24480 15440 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 27049 0.0 0.0 55288 284 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 27050 0.0 0.0 55288 300 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 27507 0.0 0.0 1074053024 11888 pts/8 Sl Nov03 0:02 git-annex find --include=* --json
dandi 27514 0.0 0.0 24480 15300 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 27515 0.0 0.0 54860 280 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 27516 0.0 0.0 54860 304 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 27561 0.0 0.0 1074053024 11896 pts/8 Sl Nov03 0:02 git-annex find --include=* --json
dandi 27576 0.0 0.0 24480 15348 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 27577 0.0 0.0 54564 284 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 27578 0.0 0.0 54564 300 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 27762 0.0 0.0 1074053024 11868 pts/8 Sl Nov03 0:02 git-annex find --include=* --json
dandi 27769 0.0 0.0 24480 15348 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 27770 0.0 0.0 52888 284 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 27771 0.0 0.0 52888 304 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 28340 0.0 0.0 1074053024 12576 pts/8 Sl Nov03 0:01 git-annex find --include=* --json
dandi 28347 0.0 0.0 24480 15352 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 28348 0.0 0.0 51756 856 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 28350 0.0 0.0 51756 824 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 28997 0.0 0.0 1074053024 9744 pts/8 Sl Nov03 0:00 git-annex find --include=* --json
dandi 29004 0.0 0.0 24480 15444 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 29005 0.0 0.0 47764 280 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 29006 0.0 0.0 47764 2140 pts/8 S Nov03 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
--
dandi 3616062 0.7 0.0 1074053024 36472 pts/8 Sl 07:31 0:11 git-annex find --include=* --json
dandi 3616069 0.0 0.0 25048 18052 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3616070 0.0 0.0 40604 11644 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3616071 0.0 0.0 40604 11528 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3616142 0.7 0.0 1074053024 34376 pts/8 Sl 07:31 0:12 git-annex find --include=* --json
dandi 3616149 0.0 0.0 0 0 pts/8 Z 07:31 0:00 [git] <defunct>
dandi 3616150 0.0 0.0 60064 24028 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3616151 0.0 0.0 60064 19888 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3616220 0.7 0.0 1074053024 34692 pts/8 Sl 07:31 0:11 git-annex find --include=* --json
dandi 3616227 0.0 0.0 0 0 pts/8 Z 07:31 0:00 [git] <defunct>
dandi 3616228 0.0 0.0 55664 19908 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3616229 0.0 0.0 55664 20032 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3616299 0.7 0.0 1074053024 34720 pts/8 Sl 07:31 0:11 git-annex find --include=* --json
dandi 3616306 0.0 0.0 24976 17852 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3616307 0.0 0.0 51312 20100 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3616308 0.0 0.0 51312 15704 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3616682 0.7 0.0 1074053024 35300 pts/8 Sl 07:31 0:11 git-annex find --include=* --json
dandi 3616689 0.0 0.0 24976 18012 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3616690 0.0 0.0 43496 15976 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3616691 0.0 0.0 43496 15924 pts/8 S 07:31 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3617046 0.6 0.0 1074053024 34468 pts/8 Sl 07:32 0:10 git-annex find --include=* --json
dandi 3617053 0.0 0.0 24976 18088 pts/8 S 07:32 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3617054 0.0 0.0 43508 11756 pts/8 S 07:32 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3617055 0.0 0.0 43508 11408 pts/8 S 07:32 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3617298 0.5 0.0 1074053024 36844 pts/8 Sl 07:32 0:09 git-annex find --include=* --json
dandi 3617305 0.0 0.0 24976 18044 pts/8 S 07:32 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3617306 0.0 0.0 52232 15760 pts/8 S 07:32 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3617307 0.0 0.0 52232 15744 pts/8 S 07:32 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3617796 0.5 0.0 1074053024 34628 pts/8 Sl 07:33 0:07 git-annex find --include=* --json
dandi 3617803 0.0 0.0 24976 17988 pts/8 S 07:33 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3617804 0.0 0.0 56708 15716 pts/8 S 07:33 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3617805 0.0 0.0 56708 15748 pts/8 S 07:33 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3617970 0.5 0.0 1074053024 34524 pts/8 Sl 07:33 0:07 git-annex find --include=* --json
dandi 3617977 0.0 0.0 24976 17980 pts/8 S 07:33 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3617978 0.0 0.0 58948 15936 pts/8 S 07:33 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3617979 0.0 0.0 58948 15792 pts/8 S 07:33 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3618449 0.4 0.0 1074053024 34440 pts/8 Sl 07:34 0:06 git-annex find --include=* --json
dandi 3618456 0.0 0.0 24976 18064 pts/8 S 07:34 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3618457 0.0 0.0 60564 16148 pts/8 S 07:34 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3618458 0.0 0.0 60564 15952 pts/8 S 07:34 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3620441 0.2 0.0 1074053024 34208 pts/8 Sl 07:35 0:03 git-annex find --include=* --json
dandi 3620448 0.0 0.0 24976 18192 pts/8 S 07:35 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3620449 0.0 0.0 59532 15880 pts/8 S 07:35 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3620450 0.0 0.0 59532 15736 pts/8 S 07:35 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3620982 0.1 0.0 1074053024 34376 pts/8 Sl 07:36 0:02 git-annex find --include=* --json
dandi 3620989 0.0 0.0 24976 18120 pts/8 S 07:36 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3620990 0.0 0.0 55924 11924 pts/8 S 07:36 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3620991 0.0 0.0 55924 11512 pts/8 S 07:36 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3621445 0.0 0.0 1074053024 34520 pts/8 Sl 07:37 0:00 git-annex find --include=* --json
dandi 3621452 0.0 0.0 24976 18052 pts/8 S 07:37 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs ls-files --stage -z --error-unmatch --
dandi 3621453 0.0 0.0 53088 11892 pts/8 S 07:37 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch-check=%(objectname) %(objecttype) %(objectsize) --buffer
dandi 3621454 0.0 0.0 53088 11624 pts/8 S 07:37 0:00 git --git-dir=.git --work-tree=. --literal-pathspecs cat-file --batch=%(objectname) %(objecttype) %(objectsize) --buffer
(venv) (base) dandi@drogon:/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad$
@yarikoptic I'm not sure what exactly is happening anymore. However, I've done my best to kill any & all subprocess on error in #295.
After #295 we get something "fresh":
2022-11-08T04:26:52-0500 [ERROR ] backups2datalad: Exception raised while handling output from `git ls-tree -r --name-only -z HEAD` [cwd=/mnt/backup/dandi/dandizarrs/c2b1efb6-27c5-4747-9b79-7141ed98a513]
Traceback (most recent call last):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 226, in stream_null_command
async for chunk in splitter.aitersplit(stream):
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/linesep/splitters.py", line 219, in aitersplit
async for s in aiterable:
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/abc/_streams.py", line 31, in __anext__
return await self.receive()
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/streams/text.py", line 43, in receive
chunk = await self.transport_stream.receive()
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 1010, in receive
data = await self._stream.read(max_bytes)
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/streams.py", line 684, in read
await self._wait_for_data('read')
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/streams.py", line 517, in _wait_for_data
await self._waiter
asyncio.exceptions.CancelledError
...
2022-11-08T04:26:52-0500 [ERROR ] asyncio: Exception in callback SubprocessStreamProtocol.pipe_data_received(1, b'1/274\x000/...9/9/196\x000/')
handle: <Handle SubprocessStreamProtocol.pipe_data_received(1, b'1/274\x000/...9/9/196\x000/')>
Traceback (most recent call last):
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/subprocess.py", line 73, in pipe_data_received
reader.feed_data(data)
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/asyncio/streams.py", line 472, in feed_data
assert not self._eof, 'feed_data after feed_eof'
AssertionError: feed_data after feed_eof
...
2022-11-08T04:27:05-0500 [ERROR ] backups2datalad: Job failed on input <Dandiset 000108/draft>:
Traceback (most recent call last):
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/aioutil.py", line 167, in dowork
outp = await func(inp)
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 145, in update_dandiset
changed = await self.sync_dataset(dandiset, ds, dmanager)
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/datasetter.py", line 188, in sync_dataset
await syncer.sync_assets(error_on_change)
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/syncer.py", line 36, in sync_assets
self.report = await async_assets(
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/asyncer.py", line 499, in async_assets
nursery.start_soon(dm.read_addurl)
File "/home/dandi/miniconda3/envs/dandisets/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
raise exceptions[0]
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 526, in sync_zarr
await zsync.run()
File "/mnt/backup/dandi/dandisets/tools/backups2datalad/zarr.py", line 260, in run
raise RuntimeError(
RuntimeError: Zarr eb14aa4a-af93-41be-a62a-aa0dd573d581: local checksum 'fe61f9e261c479d7e9d250e88a7659f1-103386--149262577373' differs from remote checksum 'c72911f5e683930da80c90e8bcf3104a-4000--5785686453' after backup, and no change on server was detected
full log - /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.07.18.36.17Z.log .
could it be that it is the "RuntimeError: differs from remote checksum" is the one which causes cancellation of all the async processes?
at least in one of the two previously mentioned logs I see the same exception and even for the same zarr!
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad$ grep RuntimeError /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.02.16.24.18Z.log
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log: raise RuntimeError(
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log:RuntimeError: Zarr eb14aa4a-af93-41be-a62a-aa0dd573d581: local checksum 'fe61f9e261c479d7e9d250e88a7659f1-103386--149262577373' differs from remote checksum 'c72911f5e683930da80c90e8bcf3104a-4000--5785686453' after backup, and no change on server was detected
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log: raise RuntimeError(
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log:RuntimeError: Backups for 1 Dandiset failed
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log: raise RuntimeError(
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log:RuntimeError: Backups for 1 Dandiset failed
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log: raise RuntimeError(
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log:RuntimeError: Backups for 1 Dandiset failed
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log: raise RuntimeError(
/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log:RuntimeError: Backups for 1 Dandiset failed
double-checking checksum from the server
❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/zarr/eb14aa4a-af93-41be-a62a-aa0dd573d581/' -H 'accept: application/json' | jq .checksum
"c72911f5e683930da80c90e8bcf3104a-4000--5785686453"
is our logic for this error message on zarrs just means that zarr computed on the server differs from what we get from S3, and zarr should be reingested?
grep
to see in how many logs where feed_data error happened such RuntimeError was present too...finished running that grep
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad$ grep -l 'AssertionError: feed_data after feed_eof' *log _*/*log | while read f; do echo -n "$f: " ; grep "Zarr.*checksum.*differs" $f || echo -; done
2022.11.01.19.31.16Z.log: RuntimeError: Zarr eb14aa4a-af93-41be-a62a-aa0dd573d581: local checksum 'fe61f9e261c479d7e9d250e88a7659f1-103386--149262577373' differs from remote checksum 'c72911f5e683930da80c90e8bcf3104a-4000--5785686453' after backup, and no change on server was detected
2022.11.02.16.24.18Z_filt.log: -
2022.11.02.16.24.18Z.log: -
2022.11.03.19.21.15Z.log: RuntimeError: Zarr eb14aa4a-af93-41be-a62a-aa0dd573d581: local checksum 'fe61f9e261c479d7e9d250e88a7659f1-103386--149262577373' differs from remote checksum 'c72911f5e683930da80c90e8bcf3104a-4000--5785686453' after backup, and no change on server was detected
2022.11.04.12.49.38Z.log: RuntimeError: Zarr eb14aa4a-af93-41be-a62a-aa0dd573d581: local checksum 'fe61f9e261c479d7e9d250e88a7659f1-103386--149262577373' differs from remote checksum 'c72911f5e683930da80c90e8bcf3104a-4000--5785686453' after backup, and no change on server was detected
2022.11.07.18.36.17Z.log: RuntimeError: Zarr eb14aa4a-af93-41be-a62a-aa0dd573d581: local checksum 'fe61f9e261c479d7e9d250e88a7659f1-103386--149262577373' differs from remote checksum 'c72911f5e683930da80c90e8bcf3104a-4000--5785686453' after backup, and no change on server was detected
_old_/2022.09.08.18.27.28Z_filt.log: -
_old_/2022.09.08.18.27.28Z.log: -
_old_/2022.09.09.13.53.32Z_filt.log: -
_old_/2022.09.09.13.53.32Z.log: -
_old_/2022.09.12.13.46.43Z_filt.log: -
_old_/2022.09.12.13.46.43Z.log: -
_old_/2022.09.21.17.03.13Z_filt.log: -
_old_/2022.09.21.17.03.13Z.log: -
so some old logs didn't have it but recent did, I will run sync for that zarr alone for now and see where we get. It was redigested:
❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/zarr/eb14aa4a-af93-41be-a62a-aa0dd573d581/' -H 'accept: application/json' | jq .checksum
"fe61f9e261c479d7e9d250e88a7659f1-103386--149262577373"
so we have that one matching! I will reset recent dirty zarrs again and restart backup process.
@jwodder do you think it would be possible to delay crashing the entire process due to that RuntimeError until all zarrs/assets are considered i.e. to accumulate the errors and then spit out some "composite exception"? I know that it sounds cumbersome, but I am afraid that it would be more than one of such zarrs in the archive and given that we go through them in the same order each time and it takes hours to reach the next failing
@yarikoptic
do you think it would be possible to delay crashing the entire process due to that RuntimeError until all zarrs/assets are considered i.e. to accumulate the errors and then spit out some "composite exception"?
Exactly what do you want to happen instead of the current behavior? If a checksum mismatch is detected after backing up a Zarr, should the Zarr backup just conclude normally aside from the delayed error? Exactly when should the delayed error be raised?
it was more of a feasibility question, but could be expressed in primitive python as
exceptions = []
for zarr in zarrs:
try:
dowhatever(zarr)
except Exception as exc:
lgr.exception("Delaying raising for exception while processing zarrs")
exceptions.append(exc)
if exceptions:
raise RuntimeError(f"{len(exceptions)} exceptions while dowhatevering zarrs happened. Details were logged")
Also WDYT about the question above
could it be that it is the "RuntimeError: differs from remote checksum" is the one which causes cancellation of all the async processes?
@yarikoptic Is that code intended as a direct answer to my questions? Catching exceptions raised while processing Zarrs isn't really an option, as the Zarrs are processed concurrently with the rest of the assets in a task group.
Also WDYT about the question above
Possibly.
@yarikoptic Is that code intended as a direct answer to my questions?
no, it is not the direct answer - it was just formulation in Python. But there is then answers to your questions:
If a checksum mismatch is detected after backing up a Zarr, should the Zarr backup just conclude normally aside from the delayed error?
I am not sure what other steps would lead to "conclude" but really it should not be considered "normal course of backup". The effect I wanted is to delay erroring the entire process out until we are done with all zarrs first, where some might "conclude normally" since they would be perfectly ok, and some known to be problematic. And then if there was any problematic zarr -- raise and exception and refer to some prior logs (ideally with tracebacks etc) with details on what was problematic in those zarrs.
Exactly when should the delayed error be raised?
e.g. after finishing a loop on zarrs if any zarr is problematic.
@yarikoptic The only Zarr backup steps that happen after the point where the RuntimeError is currently raised are: updating the checksum file and updating the s3sync.json
file. If the error is "delayed", do you want these to happen or not?
e.g. after finishing a loop on zarrs if any zarr is problematic.
There is no "loop on zarrs". The code loops through all assets, starts up asynchronous tasks for any Zarrs, and then after all tasks are done, the results of the Zarr tasks are retrieved.
I haven't seen those for quite a while although we do get other errors etc! may be indeed #299 resolved them?! let's let it RiP
continuation of #260 saga. Happened on the recent run for 000108 of
so workers was 4. The log is /mnt/backup/dandi/dandisets/.git/dandi/backups2datalad/2022.11.01.19.31.16Z.log just in case . 000108 was left in
state and I am not sure if any commit was done
I will reset and rerun with --workers 2 now. I might check some recently modified zarrs if all good first