GEOS-ESM / jedi_bundle

Repo for building JEDI packages
Apache License 2.0
2 stars 2 forks source link

Move to blobless clones by default? #40

Open mathomp4 opened 2 weeks ago

mathomp4 commented 2 weeks ago

This is an exploratory question for @Dooruk and others here.

Namely, every so often, the route from NASA to GitHub goes through some crazy bad path and it takes forever to clone anything from GitHub.

The SI Team's usual response to this is "Use blobless clones". Indeed, I make a remark about that on my JEDI-GEOS attempt wiki instructions.

Indeed, search the internet and blobless clones are often the default for many now.

So, I'd like to add support for them here. I could pretty easily add the relevant option to:

https://github.com/GEOS-ESM/jedi_bundle/blob/ed1aeed8f146f0d43d6ecdc2df795f1568d2a714/src/jedi_bundle/utils/git.py#L144

and I am 99% sure this wouldn't affect anything. I should invoke @asewnath as #38 also goes near the git bits here.

Perhaps instead it should be an true/false key-value in build.yaml to allow users to not do blobless?

I ask for debate.

Dooruk commented 1 week ago

@mathomp4, are you able to run ctests within one of your blobless builds? I'm %98 sure this wouldn't impact anything, but we can test and see. We may want to use some previous versions of certain repos (e.g., CRTM), including some that are dedicated to data only (ufo-data with hashes), but that's just a matter of requesting certain versions.

For the most part, pertaining CI tests and users, there will not be changes made to the JEDI builds that will require tracking history. For developers though, I'm not sure if blobless would impact pushing changes back to JEDI? However, if SI team has been using it in GEOS, that should be fine?

I looked at the article and the video in the link you provided, it all makes sense to me. I didn't know about this issue! JEDI uses git-lfs, but that is only for large file storage, does not impact the histories it seems.

mathomp4 commented 1 week ago

Blobless clones should always be fine if you have internet access. The price you pay for a quick initial clone is that all subsequent GitHub actions will require a call to GitHub. So for example, if you want to check out another branch or tag, it isn't automatically in the clone. Or rather the blob isn't automatically downloaded during cloning, so it must be retrieved from GitHub.

This is why I advise users if they use blob clones somewhere like discover, pretty much all get operations that do something "new" need to be done on a head node rather than a compute node, since the compute node can't see the Internet

mathomp4 commented 1 week ago

But, yes, please test. I was just inspired by being annoyed at slow clones on discover. 😄