Closed yarikoptic closed 1 year ago
I even after a complete run -- no updates to stats... I suspect that it simply doesn't do what it supposed to do: e.g. the last commits
commit 98b24e00db8296260c75ffe1435a9f91877ca721 (HEAD -> draft, github/draft, github/HEAD)
Author: DANDI User <info@dandiarchive.org>
Date: Tue Oct 4 11:04:22 2022 +0000
[backups2datalad] 18 files updated
.dandi/assets-state.json | 2 +-
.dandi/assets.json | 396 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------------------------------------------------------------------------
2 files changed, 199 insertions(+), 199 deletions(-)
commit 6cb82de90a3b587954497280eda77764f0c92517
Author: DANDI User <info@dandiarchive.org>
Date: Fri Sep 30 16:28:31 2022 +0000
[backups2datalad] 1 file updated
.dandi/assets-state.json | 2 +-
.dandi/assets.json | 450 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------------------------------------------------------------------------
2 files changed, 226 insertions(+), 226 deletions(-)
suggest updates to some (18!) files but there are no updates to files, only metadata listings. So most likely subproject states were not updated. I am running now with --verify-timestamps
but I do not think it should change anything. While working on 000108 the server is busy with many git-annex
processes working on zarrs, so my suspiciou is that just updates to them are not reflected in the dandiset submodules state... Actually while it is still running I see in the status
of the dandiset changes to state of about 18 .zarrs -- let's see if those would get committed whenever this run finishes up.
so indeed -- it finished updating but didn't commit those modified .zarr states! I did git commit --amend
to commit them . any immediate fix ideas @jwodder? otherwise I will look into it tomorrow. Meanwhile -- stats still aren't populated for any of those zarrs in their .git/config's -- so yet another aspect to debug
@yarikoptic I believe the reason this is happening is because the code added in #272 only applies to Zarrs stat'ed while stat'ing a containing Dandiset, but this (usually) only happens when running the update-github-metadata
command; when running update-from-backup
instead, Zarrs are instead stat'ed here, and the results are cached for use when calculating the stats of the Dandiset. I think the fix will involve passing an argument to get_stats()
indicating whether the dataset is a Zarr.
I think the fix will involve passing an argument to
get_stats()
indicating whether the dataset is a Zarr
why would it matter either it is zarr or not -- I think stats could be "cached" in .git/config regardless of the dataset "type", can't they?
@yarikoptic They could be cached for non-Zarrs as well, but they're currently not.
ok, working on PR for fixing stats situation. Could you analyze why .zarr state updates were not committed and send PR to fix that aspect?
@yarikoptic Can you identify the logfile for the run that should have saved updates but didn't?
my bet would be both of those two - the largest for yesterday and mentioning 000108:
(base) dandi@drogon:/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad$ ls -lSa 2022.10.05*log | head -n 2
-rw-r--r-- 1 dandi dandi 573173789 Oct 5 19:02 2022.10.05.00.08.10Z.log
-rw-r--r-- 1 dandi dandi 18977818 Oct 5 01:57 2022.10.05.05.50.07Z.log
(base) dandi@drogon:/mnt/backup/dandi/dandisets/.git/dandi/backups2datalad$ grep -l 000108 2022.10.05.00.08.10Z.log 2022.10.05.05.50.07Z.log
2022.10.05.00.08.10Z.log
2022.10.05.05.50.07Z.log
@yarikoptic
2022.10.05.05.50.07Z.log
is a logfile for a non-108 backup; it only mentions 108 when it skips it2022.10.05.00.08.10Z.log
contains 26 lines saying that a Zarr is about to be committed and another 26 lines saying that a commit to a Zarr was completed.(base) dandi@drogon:/mnt/backup/dandi/dandisets/000108$ git reflog -n 2
79173d8585 (HEAD -> draft, github/draft, github/HEAD) HEAD@{0}: commit (amend): [backups2datalad] 26 files updated
7c1e8a4a2a HEAD@{1}: commit: [backups2datalad] 26 files updated
so here they are:
(base) dandi@drogon:/mnt/backup/dandi/dandisets/000108$ git diff --stat 7c1e8a4a2a..79173d8585 | nl
1 ...es-20220316h10m52s23_sample-12_stain-LEC_run-1_chunk-1_SPIM.ome.zarr | 2 +-
2 ...es-20220316h10m52s23_sample-12_stain-LEC_run-1_chunk-2_SPIM.ome.zarr | 2 +-
3 ...es-20220316h10m52s23_sample-12_stain-LEC_run-1_chunk-3_SPIM.ome.zarr | 2 +-
4 ...es-20220316h10m52s23_sample-12_stain-LEC_run-1_chunk-4_SPIM.ome.zarr | 2 +-
5 ...es-20220316h10m52s23_sample-12_stain-LEC_run-1_chunk-5_SPIM.ome.zarr | 2 +-
6 ...es-20220316h10m52s23_sample-12_stain-LEC_run-1_chunk-6_SPIM.ome.zarr | 2 +-
7 ...ses-20220316h10m52s23_sample-12_stain-NN_run-1_chunk-1_SPIM.ome.zarr | 2 +-
8 ...ses-20220316h10m52s23_sample-12_stain-NN_run-1_chunk-2_SPIM.ome.zarr | 2 +-
9 ...ses-20220316h10m52s23_sample-12_stain-NN_run-1_chunk-3_SPIM.ome.zarr | 2 +-
10 ...ses-20220316h10m52s23_sample-12_stain-NN_run-1_chunk-4_SPIM.ome.zarr | 2 +-
11 ...ses-20220316h10m52s23_sample-12_stain-NN_run-1_chunk-5_SPIM.ome.zarr | 2 +-
12 ...ses-20220316h10m52s23_sample-12_stain-NN_run-1_chunk-6_SPIM.ome.zarr | 2 +-
13 ...ses-20220316h10m52s23_sample-12_stain-YO_run-1_chunk-2_SPIM.ome.zarr | 2 +-
14 ...ses-20220316h10m52s23_sample-12_stain-YO_run-1_chunk-3_SPIM.ome.zarr | 2 +-
15 ...ses-20220316h10m52s23_sample-12_stain-YO_run-1_chunk-4_SPIM.ome.zarr | 2 +-
16 ...ses-20220316h10m52s23_sample-12_stain-YO_run-1_chunk-5_SPIM.ome.zarr | 2 +-
17 ...ses-20220316h10m52s23_sample-12_stain-YO_run-1_chunk-6_SPIM.ome.zarr | 2 +-
18 ...es-20220316h16m47s38_sample-13_stain-LEC_run-1_chunk-1_SPIM.ome.zarr | 2 +-
19 ...es-20220316h16m47s38_sample-13_stain-LEC_run-1_chunk-5_SPIM.ome.zarr | 2 +-
20 ...es-20220316h16m47s38_sample-13_stain-LEC_run-1_chunk-6_SPIM.ome.zarr | 2 +-
21 ...ses-20220316h16m47s38_sample-13_stain-NN_run-1_chunk-4_SPIM.ome.zarr | 2 +-
22 ...ses-20220316h16m47s38_sample-13_stain-NN_run-1_chunk-5_SPIM.ome.zarr | 2 +-
23 ...ses-20220316h16m47s38_sample-13_stain-NN_run-1_chunk-6_SPIM.ome.zarr | 2 +-
24 ...ses-20220316h16m47s38_sample-13_stain-YO_run-1_chunk-1_SPIM.ome.zarr | 2 +-
25 ...ses-20220316h16m47s38_sample-13_stain-YO_run-1_chunk-3_SPIM.ome.zarr | 2 +-
26 ...ses-20220316h16m47s38_sample-13_stain-YO_run-1_chunk-4_SPIM.ome.zarr | 2 +-
27 ...ses-20220316h16m47s38_sample-13_stain-YO_run-1_chunk-5_SPIM.ome.zarr | 2 +-
28 27 files changed, 27 insertions(+), 27 deletions(-)
hm -- 27 not 26, but close ;)
FWIW, I wonder if it is not some kind of datalad issue if we use datalad.save
and it is not committing those changes already present in the index...
@yarikoptic I believe the problem is due to the fact that we only commit a Dandiset backup if any(r["state"] != "clean" for r in self.ds.status())
is true, and I'm guessing that this check doesn't detect changes in uninstalled subdatasets.
but AFAIK a commit did happen, it just didn't commit those paths, only the stuff under .dandi/
@yarikoptic Oh, I was looking at the wrong part of the log file. Your guess above might be it.
If it is, could you file an issue with datalad with a reproducer please?
FWIW, for now I have upgraded datalad from 0.17.2 to 0.17.6 and will give update a run so we could at least possibly populate those stats records.
@yarikoptic Issue filed: https://github.com/datalad/datalad/issues/7074
I think the original issue is resolved by now:
(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandizarrs$ grep stats */.git/config | nl | tail
3996 ff89a092-830a-440c-8953-3f868ae9397a/.git/config: stats = 8541e2c239b44fd8afb9b4eacc32afba919a6bb8,103962,90955848774
3997 ff8b273a-8f20-470e-b0d5-b316d6279a55/.git/config: stats = 957130d6a4041efd9d493f7aad2d916752e11b98,51259,12432769841
3998 ffaee4f2-f307-43cd-a035-c0dbe00b1d51/.git/config: stats = eadc8eb2341cd0f7ee13aef0be723ae7a7d0f5fc,96936,96917197195
3999 ffc75449-ff75-4a33-84f9-3ac6eee72d8e/.git/config: stats = 7184689cca24370c3e00e324b0d6659a37559ef2,52129,50129090934
4000 ffd7e5bd-7d35-41bd-8e40-36cd9d0cade5/.git/config: stats = 91c6b31073cb80ecd4fd6472fc3a218d6a1c444b,29026,48203187348
4001 ffdce32f-6cde-4ddc-ba29-46470f5bf7de/.git/config: stats = ad1a4934ca2af99ab01259ccfccade581f7e81d4,69094,79471372783
4002 ffdf2cc5-5e48-4105-b9c3-37ad9c8bcb88/.git/config: stats = ce44ead23da0d3fccd5360cdd170172ed8b77d05,83735,101481881467
4003 ffe18d8f-799a-4f92-aae1-700c34d53a66/.git/config: stats = 08c6badcd444602b1d531fecab94bc06c81e0312,96616,46161959101
4004 fff0788e-5535-4afa-8058-bb00f5687053/.git/config: stats = e45b378a233038271d00ac36a161f0ce439a76e8,96020,26907917584
4005 fff40b1a-e8ea-45af-8be8-4e97d901def6/.git/config: stats = 2fd47f34532d467fe8fc10f7e27267b8245d3b50,63579,41202122498
(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandizarrs$ ls -ld *-*-*-* | nl | tail -n 1
4005 drwxr-sr-x 1 dandi dandi 126 Jul 5 2022 fff40b1a-e8ea-45af-8be8-4e97d901def6
This is more of a "notes taking" issue.
272 added storing stats for zarrs within
.git/config
to speed up computation of the entire dandiset sizes. Diff looks kosher to me and I think we had one full backup run on 000108 but so far we got only 1 such record across all zarrswhich was odd to me since AFAIK all zarrs for a dandiset should get it.... but this one is not from 000108 but 000243! ;)
Current run of the backup for 000108 is still running, so I guess we would need to finish waiting for it to complete first for more conclusive look at the situation.