E3SM-Project / e3sm_diags

E3SM Diagnostics package
https://e3sm-project.github.io/e3sm_diags
BSD 3-Clause "New" or "Revised" License
42 stars 32 forks source link

Discussion about use of https://web.lcrc.anl.gov/ in CI #900

Open xylar opened 4 days ago

xylar commented 4 days ago

We have been getting downloads throttled by LCRC on https://web.lcrc.anl.gov/ because we use too much bandwidth. This is affecting research.

We should seek alternatives to https://web.lcrc.anl.gov/ in our CI, e.g.: https://github.com/E3SM-Project/e3sm_diags/blob/ca41b0e5d913610c88410928951f1ed11c75663f/tests/integration/download_data.py#L89

xylar commented 4 days ago

@mahf708, do you have suggestions (e.g. containers) that we could use for e3sm_diags instead of downloading directly from the LCRC server?

xylar commented 4 days ago

We are seeing time-outs in https://github.com/conda-forge/e3sm_diags-feedstock/pull/38, which are likely to cause ongoing trouble building conda packages.

xylar commented 4 days ago

I was able to get CI to pass on https://github.com/conda-forge/e3sm_diags-feedstock/pull/38 after restarting 3 times.

xylar commented 4 days ago

After 8 attempts, I was finally able to get the conda package to build. Needless to say, this is not sustainable.

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=1086951&view=logs&j=c1df603f-8689-50eb-30e7-21f597a4c2a3&t=e484c5e5-b8bf-5dcd-d19a-a6e998d41ead

mahf708 commented 4 days ago

I will write a more detailed comment, but I think we should use a container. I can make one for e3sm diags like the others I made for testing in https://github.com/E3SM-Project/containers

mahf708 commented 4 days ago

It's been on my list of todos to get a generic conda container that has some of our data from the servers...

I disabled two workflows (scream defaults and mkatmsrf...) because of this very reason

chengzhuzhang commented 4 days ago

@xylar yes, this now became an outstanding issue, and we should find alternatives for hosting data needed for CI. Does mpas-analysis has a similar need or it is handled differently?

@mahf708 it looks like the container repo, you already have codes for data from input data directory, it sounds like we can just mimic it to add the diagnostics data.

xylar commented 4 days ago

@chengzhuzhang, this issue doesn't affect MPAS-Analysis because we don't try to do anything so sophisticated in CI. I still run tests manually on Chrysalis as needed.

tomvothecoder commented 4 days ago

I had a inclination that running the GH Actions build with Python 3.9-3.12 while simultaneously downloading the same data for each run would throttle LCRC. We can make GH Actions only run when a PR is marked as ready for review if a short-term solution is needed. A possible alternative solution that was mentioned before is to cache the diagnostic data on GitHub Actions, then updating the cache if updated diags data on LCRC is detected.

It looks like we still need a general solution for azure pipelines though.

xylar commented 4 days ago

I think we really need to make it forbidden to download files from LCRC in CI. It's badly affecting our ability to do other work.

xylar commented 4 days ago

So I think even if we allow it in fewer circumstances, it's still not good enough.

mahf708 commented 4 days ago

I would like to make a container based on the official conda-forge miniconda container, then add the needed inputdata to it. I will put up a prototype on https://github.com/E3SM-Project/containers in the next few days (I need to collect info about the data needed)

tomvothecoder commented 4 days ago

@xylar Makes sense to me.

@mahf708 Let me know if you'd like me to test it out with e3sm_diags when it is ready.

mahf708 commented 3 days ago

Can we get @rljacob to weigh in just in case he prefers something else?

Rob, should we institute a policy that none of our testing should be touching the inputdata server? I doubt it is the sole reason we are seeing issues, but who knows...

I am happy to streamline a few containers with everything we need, so that we have no reason to download stuff from the server

mahf708 commented 2 days ago

A resolution is offered in #901