Speed up the algorithm analysis workflow

AhmedBasem20 commented 6 months ago

Fixes #43

Created dependency caching for python environment.
Separated the slow algorithms into another job that runs in parallel with the default algorithms set.

Those techniques speed up the workflow from ~25 minutes to ~15 minutes See: https://github.com/AhmedBasem20/TF2.4_IVIM-MRI_CodeCollection/actions/runs/8149764438

AhmedBasem20 commented 6 months ago

@etpeterson I kept only the caching commit on this PR. I tried adding caching to the unit test but I'm not sure if this is possible, we have a matrix of different python versions and operation systems. I think the installation step is necessary in this case and the cache will not help speeding the process. What do you think? Also if you have any thoughts on further improving the analysis workflow let me know.

etpeterson commented 6 months ago

Now that I look into it more I realize I'm less sure what we really want. To start, it seems to work, it speeds up the setup time, so that's good. I also see that we're already hitting the 10GB cache limit on github, so maybe cache optimization could be a future project.

That said, I notice you're double caching. The cache step and the python setup step. I found a post that suggests the opposite order is the way to go. I think because the cache restores the binaries and then python setup doesn't need to do anything anymore.

I'm not sure if caching the binary is good or not. It seems potentially fragile - if anything changes then it could fail in strange ways. Because of this, I think the key should be as specific as possible to try to avoid matching incorrectly. Things like OS, python version, requirements file, and anything else you can think of.

I think caching both workflows is good - at least in theory. The caches are saved across runs so no problem there, even if it is the testing grid that's running. Cache size is an issue though, but maybe we can figure that out once we optimize caching itself.

AhmedBasem20 commented 6 months ago

Hi @etpeterson, the large cache size is because we create a pip cache 12 times on the unit test matrix, which is not used. I see that the setup-python cache is not used as well on analysis.yml workflow, so I'll try your proposed solution (caching the binaries) and let you know how it goes.

For the unit_test.yml cache I still can't imagine how this could be done, if you have further thoughts or approaches please let me know.

AhmedBasem20 commented 6 months ago

I noticed that we don't need the caching on the python-setup for the analysis workflow, because we already cached the environment, the extra caching restoration was taking around 25 seconds. Now it is removed.

P.S. I realized the importance of the cache step on the unit test workflow (with the large size) so I put it back.

etpeterson commented 6 months ago

I'm not sure if caching the binary is good or not. It seems potentially fragile - if anything changes then it could fail in strange ways. Because of this, I think the key should be as specific as possible to try to avoid matching incorrectly. Things like OS, python version, requirements file, and anything else you can think of.

I think this bit is still missing, being more specific about what we're caching. Right now if the OS gets upgraded we won't see that in the key and are going to end up with some strange errors.

etpeterson commented 6 months ago

One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible.

AhmedBasem20 commented 6 months ago

I think this bit is still missing, being more specific about what we're caching. Right now if the OS gets upgraded we won't see that in the key and are going to end up with some strange errors.

I added the OS type, and I see that env.pythonLocation already contains the python version. For the OS version I don't see a straight way to get it. Even in the setup-python action, they display the OS version using this script for Linux only.

AhmedBasem20 commented 6 months ago

One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible.

I am using it because I had a lot of debugging commits with names like fix, debug, etc. When I reach a solution, I rebase those commits into one commit with a descriptive name to have a clean history, and then force push it. I thought this was a good practice (except for the last step for sure). What do you think is an alternative to having a clean history and avoiding force pushing?

etpeterson commented 6 months ago

One other note, I see you're force pushing all the time. That's a big red flag because it can rewrite history and do all kinds of other nastyness. You should almost never need to force push, so try to avoid it if at all possible.

I am using it because I had a lot of debugging commits with names like fix, debug, etc. When I reach a solution, I rebase those commits into one commit with a descriptive name to have a clean history, and then force push it. I thought this was a good practice (except for the last step for sure). What do you think is an alternative to having a clean history and avoiding force pushing?

Usually there's no need to push so locally you can do basically anything you want. Restarting, recreating branches, etc. In this case though you needed to push to run so it makes sense that you had many commits. It's probably the worst case scenario for git actually.

I don't mind having other commits in there, but that's a personal preference and some people/repositories feel strongly one way or another.

The workflow you describe might be solved with a squash commit which we can do at merge time on github. It's an option I can select when merging.

Probably the cleanest but slightly more effort solution is have a development branch and a final branch. Checkout a testing branch and iterate on that until it's good, then squash those commits and create a final branch from that. Then any more changes start in the development until they're ready for the squish and move to final branch. That way the final branch is clean of all the intermediate modifications.

I know git can sometimes be a hassle but hopefully that helped. And if force pushes are truly necessary, that's fine, especially on your own repository. Just keep in mind that it's not always an option for every respository.

OSIPI / TF2.4_IVIM-MRI_CodeCollection

Speed up the algorithm analysis workflow #53