dandi / backups2datalad

Mirror Dandisets as git-annex repositories
MIT License
1 stars 0 forks source link

Doesn't update superdataset properly any more. backups2datalad process is "stuck" #39

Closed yarikoptic closed 7 months ago

yarikoptic commented 7 months ago

I noticed that https://github.com/dandi/dandisets says that last commit was 3 weeks ago. But we have lots of recently updated dandisets: image

Individual dandiset github repos seems also have recent updates ![image](https://github.com/dandi/backups2datalad/assets/39889/b1366ccd-9ed7-46fe-a790-815f923d9cee)
on drogon I see only some not pushed (why?) updates to 000026 in dandisets superdataset ```shell dandi@drogon:/mnt/backup/dandi/dandisets$ git log github/draft..draft commit a60c7bf88d82a350f538bd83594c163a3b29e51e (HEAD -> draft) Author: DANDI Team Date: Wed Mar 27 03:25:33 2024 -0400 1 updated (000026) 000026: - [backups2datalad] 22 files added, 10 files updated commit df1dce3a6fe1aa954dd818ce54420ab31b07e71d Author: DANDI Team Date: Tue Mar 26 04:10:35 2024 -0400 1 updated (000026) 000026: - [backups2datalad] 229 files added, 29 files updated ```
and git status shows that MANY dandisets have uncommitted changes (new commits OR modified content!!) on drogon ```shell dandi@drogon:/mnt/backup/dandi/dandisets$ git status Refresh index: 100% (612/612), done. On branch draft Your branch is ahead of 'github/draft' by 2 commits. (use "git push" to publish your local commits) Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) (commit or discard the untracked or modified content in submodules) modified: 000018 (new commits) modified: 000024 (new commits) modified: 000029 (modified content) modified: 000030 (new commits) modified: 000031 (new commits) modified: 000032 (new commits) modified: 000033 (new commits) modified: 000038 (new commits) modified: 000040 (new commits) modified: 000042 (new commits) ... ```

so most likely recent changes on handling embargoed/public dandisets broke this... Then I looked at ps on drogon -- and apparently there is still a process which runs in --verify mode since Mar 07!

As such it seems to be stuck:

dandi    2645263  0.0  0.0   2484   404 ?        Ss   Mar07   0:00     /bin/sh -c chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron-nonzarr.lock bash -c '/mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify'
dandi    2645264  0.0  0.0  19748  5512 ?        S    Mar07   7:09       /usr/bin/perl /usr/bin/chronic flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron-nonzarr.lock bash -c /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify
dandi    2645317  0.0  0.0   5548   544 ?        S    Mar07   0:00         /usr/bin/flock -E 0 -e -n /home/dandi/.run/backup2datalad-cron-nonzarr.lock bash -c /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify
dandi    2645318  0.0  0.0   6952  2536 ?        S    Mar07   0:00           /bin/bash /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify
dandi    2650110  0.0  0.0   6952  1240 ?        S    Mar07   0:00             /bin/bash /mnt/backup/dandi/dandisets/tools/backups2datalad-update-cron --mode verify
dandi    2650139  0.0  0.3 1790364 200180 ?      Sl   Mar07   1:21               /home/dandi/miniconda3/envs/dandisets-2/bin/python3.1 /home/dandi/miniconda3/envs/dandisets-2/bin/backups2datalad -l WARNING --backup-root /mnt/backup/dandi --config tools/backups2datalad.cfg.yaml update-from-backup --workers 5 -e 000(026|108|243)$ --mode verify
dandi    2650140  0.0  0.0   6384  2272 ?        S    Mar07   0:00               grep -v nothing to save, working tree clean

py-spy'ing that process gives no useful detail

image

then I recalled that we have logs! and this is what I see

(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandisets$ tail -f .git/dandi/backups2datalad/2024.03.07.06.50.32Z.log
2024-03-29T03:12:15-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000729/versions/draft/ in 4292.823660 seconds as it raised PoolTimeout: 
2024-03-29T03:34:24-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000726/versions/draft/ in 8515.649940 seconds as it raised PoolTimeout: 
2024-03-29T04:12:28-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000727/versions/draft/ in 7882.410218 seconds as it raised PoolTimeout: 
2024-03-29T04:19:15-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000728/versions/draft/ in 8402.162677 seconds as it raised PoolTimeout: 
2024-03-29T04:24:48-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000729/versions/draft/ in 8413.961004 seconds as it raised PoolTimeout: 
2024-03-29T05:34:07-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000725/versions/draft/ in 15890.092814 seconds as it raised PoolTimeout: 
2024-03-29T05:57:19-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000726/versions/draft/ in 17122.589867 seconds as it raised PoolTimeout: 
2024-03-29T06:24:50-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000727/versions/draft/ in 17075.343311 seconds as it raised PoolTimeout: 
2024-03-29T06:40:18-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000728/versions/draft/ in 16793.199329 seconds as it raised PoolTimeout: 
2024-03-29T06:46:02-0400 [WARNING ] backups2datalad: Retrying GET request to /dandisets/000729/versions/draft/ in 17087.073695 seconds as it raised PoolTimeout: 

some seems to be open (000726) and some embargoed (000729) but even for embargoed endpoint seems to work just fine:

❯ curl -X 'GET' 'https://api.dandiarchive.org/api/dandisets/000729/versions/draft/' -H 'accept: application/json'
{"detail":"Authentication credentials were not provided."}%  

so seems like a bug somewhere.

jwodder commented 7 months ago

I think this may be caused by one or more bugs in httpx (possibly this one) for which fixes have been released, but the version of httpx in the dandisets-2 env is not the latest. I suggest killing the backup process and updating all of the packages before starting a new process; I believe the latter step can be accomplished by adding the option --upgrade-strategy eager to this line.

yarikoptic commented 7 months ago

ok, I will try to do that now

yarikoptic commented 7 months ago

--upgrade-strategy eager added to installation of backups2datalad had no effect on httpx

(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandisets$ pip show httpx | grep Version; chronic pip uninstall --yes backups2datalad; pip install --upgrade-strategy eager git+https://github.com/dandi/backups2datalad | grep httpx; pip show httpx | grep Version
Version: 0.25.1
  Running command git clone --filter=blob:none --quiet https://github.com/dandi/backups2datalad /tmp/pip-req-build-usy7krrb
Requirement already satisfied: httpx~=0.22 in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from backups2datalad==0.0.0) (0.25.1)
Requirement already satisfied: certifi in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from httpx~=0.22->backups2datalad==0.0.0) (2022.12.7)
Requirement already satisfied: httpcore in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from httpx~=0.22->backups2datalad==0.0.0) (1.0.2)
Requirement already satisfied: h11<0.15,>=0.13 in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from httpcore->httpx~=0.22->backups2datalad==0.0.0) (0.14.0)
Version: 0.25.1

so I just manually upgraded it:

(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandisets$ pip install --upgrade httpx
Requirement already satisfied: httpx in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (0.25.1)
Collecting httpx
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Requirement already satisfied: anyio in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from httpx) (4.0.0)
Requirement already satisfied: certifi in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from httpx) (2022.12.7)
Requirement already satisfied: httpcore==1.* in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from httpx) (1.0.2)
Requirement already satisfied: idna in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from httpx) (3.4)
Requirement already satisfied: sniffio in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from httpx) (1.3.0)
Requirement already satisfied: h11<0.15,>=0.13 in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from httpcore==1.*->httpx) (0.14.0)
Requirement already satisfied: exceptiongroup>=1.0.2 in /home/dandi/miniconda3/envs/dandisets-2/lib/python3.10/site-packages (from anyio->httpx) (1.1.3)
Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 1.8 MB/s eta 0:00:00
Installing collected packages: httpx
  Attempting uninstall: httpx
    Found existing installation: httpx 0.25.1
    Uninstalling httpx-0.25.1:
      Successfully uninstalled httpx-0.25.1
Successfully installed httpx-0.27.0
yarikoptic commented 7 months ago

I think stuckness is resolved with the upgrade, thanks!