dandi / dandisets

735 Dandisets, 812.2 TB total. DataLad super-dataset of all Dandisets from https://github.com/dandisets
10 stars 0 forks source link

zarr: does unnecessary "Counting up files..." #300

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

E.g.

2022-11-15T14:54:57-0500 [INFO    ] backups2datalad: Dandiset 000108: Zarr fb7a4c1f-6501-4242-ab42-9fb8b6e2cb78: backup up to date
2022-11-15T14:54:58-0500 [INFO    ] backups2datalad: Dandiset 000108: Zarr fb7a4c1f-6501-4242-ab42-9fb8b6e2cb78: no changes; not committing
2022-11-15T14:54:58-0500 [INFO    ] backups2datalad: Dandiset 000108: Zarr fb7a4c1f-6501-4242-ab42-9fb8b6e2cb78: Counting up files ...
...
2022-11-15T14:55:08-0500 [INFO    ] backups2datalad: Dandiset 000108: Zarr fb7a4c1f-6501-4242-ab42-9fb8b6e2cb78: Done counting up files

and for that one we have everything computed, checksum matching and .git/config not modified after this so it was there before:

(dandisets) dandi@drogon:/mnt/backup/dandi/heroku-logs/dandi-api$ git -C ../../dandizarrs/fb7a4c1f-6501-4242-ab42-9fb8b6e2cb78/ log -1
commit baea399334adb1fb6745818bf1b95c0cdcd694d8 (HEAD -> draft)
Author: DANDI User <info@dandiarchive.org>
Date:   Tue Oct 11 22:24:30 2022 +0000

    [backups2datalad] 32327 files added, 254 files deleted, checksum updated

(dandisets) dandi@drogon:/mnt/backup/dandi/heroku-logs/dandi-api$ ls -ld ../../dandizarrs/fb7a4c1f-6501-4242-ab42-9fb8b6e2cb78/.git/config
-rw-r--r-- 1 dandi dandi 949 Nov  9 13:26 ../../dandizarrs/fb7a4c1f-6501-4242-ab42-9fb8b6e2cb78/.git/config

(dandisets) dandi@drogon:/mnt/backup/dandi/heroku-logs/dandi-api$ tail -n 4 ../../dandizarrs/fb7a4c1f-6501-4242-ab42-9fb8b6e2cb78/.git/config
        merge = refs/heads/draft
[dandi]
        github-description = 34622 files, 42.7 GB
        stats = baea399334adb1fb6745818bf1b95c0cdcd694d8,34622,42708469282

so there should not even be "Counting up files" messages and time spent on that.

jwodder commented 1 year ago

@yarikoptic Is the problem just that the "Counting up files" message was printed, or do you have reason to believe that the code actually re-counted all the files?

yarikoptic commented 1 year ago

Given that it took seconds being messages, I have suspicion it was doing counting. If it doesn't - messages shouldn't appear and keep flooding the logs with misleading information