Open TeaganKing opened 2 months ago
@dabail10 started to work on this and ran into a major stumbling block -- the ploomer
script runner doesn't offer an obvious way to change environments before running the script. Dave's example was being run in cupid-dev
, which doesn't have the necessary analysis packages installed. Lev will dig into the ploomer documentation and then I'll work with Dave to get a fix in place.
As a first pass, having looked through the Ploomber documentation, it doesn't look like there's a supported way to specify an environment for the ScriptRunner task. I'd recommend starting with looking at the documentation for ScriptRunner. If we wanted to go the route of putting in a PR to Ploomber or implementing our own ScriptRunner class, it actually looks like it might be a pretty straightforward update to their code, but that might be out of scope, in which case we should put together a more hacky solution in our own code.
Just for reference, the function which includes ScriptRunner is create_ploomber_script_task.
In the ScriptRunner documentation, it looks like we should be able to specify the environment in the configuration file using something like this, adapted for the script that you have (and I don't entirely remember the structure that you had in the config file thus far):
tasks:
- source: script.py
- product: output.txt
- env: cupid-analysis
So, specifying 'env' in an additional line should ideally work to allow for using the cupid-analysis
environment. Maybe we can test it out tomorrow?
Hmm, maybe I missed something - could you point me to where in the documentation or Ploomber source code that feature is mentioned?
Apologies, I may have jumped ahead without enough research on this... I saw this in an example, but after another look, I'm now thinking that the example was not accurate and it might not actually work in the format that I commented in the code snippet above. @rmshkv did you already have a particular place in the Ploomber code you were looking at to update?
It looks like the kernelspec_name
in NotebookRunner is set roughly here. Perhaps we could add an environment to this part of ScriptRunner?
Or, if we prefer to do something from our own side, could we possibly try something like this from within the script? I'm not sure what the best method is, but wanted to toss out some ideas.
import subprocess
subprocess.run("conda activate cupid-analysis", shell=True, check=True, executable='/bin/bash')
For reference, I'm including the NotebookRunner documentation and ScriptRunner documentation.
From @mnlevy1981 , we could also possibly use something like os.system("conda run -n cupid-analysis python script.py")
in run.py
to replace line 254 (call to create_ploomber_script_task
.
Some rough thoughts: I haven't tried any of these particular solutions, but in general, I think the first attempt should be along the lines of modifying Ploomber's code or implementing our own script runner task code if we don't want to contribute to that code base. It'd be nice to keep the interface of specifying the environment in config.yml rather than having to add code to the script itself, and it would also be best to keep the script running mechanism as a Ploomber task, since that task creation step isn't actually where the script code is being run (that happens later when the DAG is built, which takes into account dependencies between tasks if those exist, including between script and notebook tasks).
We'd like to be able to run a script rather than just a notebook with CUPiD.
A good test of this would be to run a sea ice script instead of a sea ice notebook.
This will require saving figures generated from the notebook and reading them in with a standard naming convention to an html file that could show the figures. See #16 for the html component of this.