argonne-lcf / user-guides

ALCF Systems User Documentation
https://docs.alcf.anl.gov/
20 stars 29 forks source link

Need better solution for storing large/many binary assets that arent hosted on main ALCF site #533

Open felker opened 2 weeks ago

felker commented 2 weeks ago

Came up in discussion over #528, where @kaushikvelusamy wanted to upload a ~10 MB PDF slide deck to link to in the docs. These exact slides were never presented an an ALCF tech talk or workshop, so they are not already hosted on https://www.alcf.anl.gov/ and this is a bit of an edge case.

Generally, I am usually hesitant to add any binary files to version control if 1) we don't really care about versioning the files, 2) they are modified frequently, and/or 3) are larger than a few MB, as a rule of thumb. In some of those cases, or if you have too many such files in the repo history, it can bloat the .git/ size and making cloning/checkout/push/pull slower over time.

Currently in docs/, we are doing OK with only a few such files larger than a MB:

$ find . -type f -not -name "*.md" -exec du -hs {} \; | sort -h
...
1020K   ./aurora/performance-tools/images/GPU-offload-03.png
1.3M    ./aurora/performance-tools/images/FireFox-VTune05.png
1.3M    ./services/files/docker_hub_repo_build.gif
1.6M    ./ai-testbed/files/home-cerebras-sambanova.png
1.7M    ./aurora/images/Argonne_wireframe_white_transparent.eps
1.7M    ./images/Argonne_wireframe_white_transparent.eps
1.8M    ./policies/accounts/IT_Access_Agreement_for_ALCF.pdf
2.0M    ./services/files/singularity_build.gif

We are already storing 204 PNGs, 2 PDFs, 7 GIFs, 18 JPGs, 3 Microsoft Word documents, 2 EPS files (ANL wireframe logos).

Binary files like images are fine for now, since they are directly included/used in the Markdown source, and you can preview the Markdown rendering with the images locally without running mkdocs. But any file that is simply linked to, like the PDFs and .docx, should be removed from this repo, for example: https://github.com/argonne-lcf/user-guides/blob/48f2566540a1469db79ed05edefa1931d5fc80a3/docs/account-project-management/project-management/project-reports.md?plain=1#L45-L48

Some ideas for alternatives:

kevin-harms commented 2 weeks ago

You could ask Beth to host it on the site with other presentational materials.

kevin

On Thu, Nov 7, 2024 at 6:32 PM Kyle Gerard Felker @.***> wrote:

Came up in discussion over #528 https://github.com/argonne-lcf/user-guides/pull/528, where @kaushikvelusamy https://github.com/kaushikvelusamy wanted to upload a ~10 MB PDF slide deck to link to in the docs. These exact slides were never presented an an ALCF tech talk or workshop, so they are not already hosted on https://www.alcf.anl.gov/ and this is a bit of an edge case.

Generally, I am usually hesitant to add any binary files to version control if 1) we don't really care about versioning the files, 2) they are modified frequently, and/or 3) are larger than a few MB, as a rule of thumb. In some of those cases, or if you have too many such files in the repo history, it can bloat the .git/ size and making cloning/checkout/push/pull slower over time.

Currently in docs/, we are doing OK with only a few such files larger than a MB:

$ find . -type f -not -name "*.md" -exec du -hs {} \; | sort -h ... 1020K ./aurora/performance-tools/images/GPU-offload-03.png 1.3M ./aurora/performance-tools/images/FireFox-VTune05.png 1.3M ./services/files/docker_hub_repo_build.gif 1.6M ./ai-testbed/files/home-cerebras-sambanova.png 1.7M ./aurora/images/Argonne_wireframe_white_transparent.eps 1.7M ./images/Argonne_wireframe_white_transparent.eps 1.8M ./policies/accounts/IT_Access_Agreement_for_ALCF.pdf 2.0M ./services/files/singularity_build.gif

We are already storing 204 PNGs, 2 PDFs, 7 GIFs, 18 JPGs, 3 Microsoft Word documents, 2 EPS files (ANL wireframe logos).

Binary files like images are fine for now, since they are directly included/used in the Markdown source, and you can preview the Markdown rendering with the images locally without running mkdocs. But any file that is simply linked to, like the PDFs and .docx, should be removed from this repo, for example:

https://github.com/argonne-lcf/user-guides/blob/48f2566540a1469db79ed05edefa1931d5fc80a3/docs/account-project-management/project-management/project-reports.md?plain=1#L45-L48

Some ideas for alternatives:

— Reply to this email directly, view it on GitHub https://github.com/argonne-lcf/user-guides/issues/533, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEVK4J5YCKW2GGONYTS7JNLZ7QBDNAVCNFSM6AAAAABRMMHYWCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGY2DENJRG43DONA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

felker commented 2 weeks ago

yeah that is likely, but will be tabled until after SC24