Closed MilesCranmer closed 2 years ago
For example, here are all the jobs running in parallel on my fork:
install.sh and environment.yml are now different for each algorithm. This is controlled by, for example, # only: OperonRegressor
inside the environment.yml file. This seems to work okay
These environments are cached successfully, which reduces build time even more.
Here's the tests running for each algorithm:
The file algorithms.yml
lists the required install files for each algorithm. For example:
algorithms:
- name: AFPRegressor
install:
- ellyn_install.sh
would mean ellyn_install.sh
is run for AFPRegressor
. No other installs are ran because they are not needed for that build.
@lacava can you confirm the contents of environment.yml
and algorithms.yml
for me? Thanks.
Also note that this PR is rebased on the PySR one so there's a bunch of other code in this for now. It should be much smaller after the PySR one is through.
First of all, this is great @MilesCranmer , thanks for doing this!
Thanks for the review.
Definitely some bugs to work out. I looks like the tests that are "passing" are failing silently.
Thanks for pointing this out. Maybe I will just leave the environment.yml as-is then, it might be tricky to narrow down which models need what packages.
Indeed it might just be faster to have one single job as is done currently. I'll close this for now.
I see the value in testing in parallel, but I'm not sure about breaking up the environment setup (e.g., installing julia separately). Could lead to unanticipated package conflicts when everything is built together and all methods are tested in the same environment. A cached version of the whole environment would still be fast, I think.
Let's follow up about this in the PySR pull request #62. I am currently trying to add PySR to conda-forge: https://github.com/conda-forge/staged-recipes/pull/17605, but for now we could try using the conda-forge provided julia version, then include PyJulia and PySR as pip installs.
This is a draft PR to split the GitHub action by algorithm. Each algorithm will be built and tested in its own separate GitHub action. Since GitHub allows 25 simultaneous actions, this should speed up things greatly.
Draft todo's:
Note that currently PySR's PR is included #62, but these commits will disappear after that one is merged.