bids-standard / maintenance-tools

Miscellaneous maintenance tools for the broader BIDS ecosystem.
MIT License
1 stars 1 forks source link

con/tinuous archival of the CI logs #11

Open yarikoptic opened 2 years ago

yarikoptic commented 2 years ago

We (@jwodder primarily) developed the https://github.com/con/tinuous/ to establish archives of logs from various CIs. Unfortunately circle CI is not yet supported since we did not have a use-case yet so it was not implemented (https://github.com/con/tinuous/issues/124). But even without circle-ci I think it would be good to have such an archive.

Typically I just keep those logs on a server which is accessible to all project members. But BIDS project does not have such a private ssh-able server, so I might better just publish them to github.

sappelhoff commented 2 years ago

Are you aware of any sensitive information in those logs so we might better not share them publicly?

to my knowledge the CircleCI logs are openly accessible anyhow ... and I just tried accessing the GitHub action logs through a non-logged in browser and I think there the steps are not accessible.

More importantly: I don't think there is any sensitive information in these logs, so we may archive them without problems.

Since there will be "subdatasets" (I usually do one per month)

you mean a new repo each month?

Couldn't we just have a single repo and keep pushing the logs there? Or will we run into storage space issues?

sappelhoff commented 1 year ago

transferring to maintenance-tools repo

yarikoptic commented 1 year ago

Since there will be "subdatasets" (I usually do one per month)

you mean a new repo each month?

Couldn't we just have a single repo and keep pushing the logs there? Or will we run into storage space issues?

we eventually might (in particular cloning would take awhile):

(git)smaug:/mnt/datasets/datalad/ci/bids-specification[master]git
$> du -scm */*/.git/objects
29  2022/07/.git/objects
116 2022/08/.git/objects
58  2022/09/.git/objects
54  2022/10/.git/objects
51  2022/11/.git/objects
20  2022/12/.git/objects
14  2023/01/.git/objects
46  2023/02/.git/objects
14  2023/03/.git/objects
399 total

it could be done in a single repo but it would grow and would you care about logs from years back to troubleshoot recent issue - not really. With submodules it makes it all scaleable nicely in my experience. datalad foreach-dataset can be of help here to make it seamless e.g.

(git)smaug:/mnt/datasets/datalad/ci/bids-specification/2023[master]git
$> datalad foreach-dataset --o-s relpath -r -J10 git grep 'test_bids_datasets.*asl003.*FAILED' | head
02/02/pr/1385/5b3087b/github-schemacode_ci-1865-incomplete/1_ubuntu-latest with Python 3.7.txt:2023-02-02T20:32:06.3612428Z tests/test_validator.py::test_bids_datasets[asl003] FAILED
02/02/pr/1385/5b3087b/github-schemacode_ci-1865-incomplete/2_ubuntu-latest with Python 3.8.txt:2023-02-02T20:32:05.0886411Z tests/test_validator.py::test_bids_datasets[asl003] FAILED
02/02/pr/1385/5b3087b/github-schemacode_ci-1865-incomplete/3_ubuntu-latest with Python 3.9.txt:2023-02-02T20:32:10.7129902Z tests/test_validator.py::test_bids_datasets[asl003] FAILED
...