Closed hassanfa closed 4 years ago
There is documentation on what constitutes what needs to go into either an app
or a meta
api here: https://github.com/Clinical-Genomics/cg/tree/master/documentation
That manual was a first draft for the development manual and I would stick to parts of cg
. All other things are not reliable ;)
I would also suggest to rename the command to cg analysis balsamic start
as to group all pipeline specific commands in one sub command.
Cool! I will read that. Is it the same way for mip and usalt? Cause if we have balsamic, usalt and mip in up
, the command should be available within the same environment, right?
Balsamic has a step before start, which is creating analysis config. Then uses the analysis config to start analysis. For balsamic, test specification is planned to be ready sometime in September. If you are interested, I can share the draft for that on AMsystems.
The documentation was made before we were putting multiple pipelines in production and is very MIP centric. It does explain how the apps
and meta
apis work and how you're supposed to use them. It goes over all the apps
and meta
apis/interfaces that were in production beginning 2018 and describes them as well how they tie everything together.
MIP also has steps before start. We handle them as follows: there is one sub command for each step (e.g. panels, config, link). Each sub command does just that: only that subcommand. We have one aggregate subcommand called auto
that will execute all steps in the right order (e.g. panels, config, link, and then start).
So, it would be no problems to create a cg analysis balsamic config
and cg analysis balsamic start
as well as a cg analysis balsamic auto
which would start all cases that are ready. Don't worry about the auto
subcommand for now tho. We do have a cg analysis auto
which starts all pending cases for all pipelines, so that is the one to concentrate on.
One extra note: the apps
are all still importing the python API of the tool they are interfacing with. We have found that maintaining these apps
interfaces is ... error prone. We are now trying out if we can interface with the tools' CLI instead. The only apps
interface we have ported to this methodology is loqus
.
e.g. there already is a balsamic api (for fastq file handling): https://github.com/Clinical-Genomics/cg/tree/master/cg/apps/balsamic
The current CLI cg analysis
does not have a balsamic subcommand tho. e.g. the Balsamic linking is handled in cg analysis link
and determines where to link automatically for now.
Cool! I will read that. Is it the same way for mip and usalt? Cause if we have balsamic, usalt and mip in
up
, the command should be available within the same environment, right?
What command are you referring to? if cg analysis balsamic [config|start]
, then yes, this one should be available in cg
installed in the _main
environment.
It might not end well having all of them installed in main env. But we'll see how things change in future. I'm not too keen of worrying about other workflows and packages and their conda dependencies... It can only end bad.
I think you guys should consider activating conda env within cg instead. Subshell is already activated via subprocess in python, and it won't collide with others.
An example is to add conda envs to cg-stage.yaml
and cg.yaml
. Then say for balsamic, add their conda env:
balsamic:
root: /home/proj/production/cancer/cases
env: complete_path_to_conda_env_P_BALSAMIC_191231
And just create a conda and shell executor for each API. This will make it easier to add new workflows and packages. This will also address all the rigidity in cg
et al.
For example:
# create config and any prereq for workflow
cg analysis --config balsamic|mip|microsalt --...rest_of_stuff
# start a workflow:
cg analysis --start balsamic|mip|microsalt --...rest_of_stuff
This will use executors, say:
# a shell module or whatever
shell.config(workflow='balsamic', conda='complete_path_to_conda_env'...)
# a shell module or whatever
shell.start(workflow='balsamic', conda='complete_path_to_conda_env'...)
Most of workflow managers have these functionalities. I think we can get inspired from similar solutions. Managing bash functions is a nightmare. I am actually thinking to drop all of them, as they are an extra thing that I have to maintain and creates unnecessary complexity. :-) Of course, someone from prod might decide to take responsibility for those :-p
@hassanfa Still relevant?
I don't think so.
[draft for cg start balsamic]
I thought I should create this following @patrikgrenfeldt 's suggestion on having a cg start balsamic. I'd appreciate if cg owners and ttn members can share here what cg needs and I can share what balsamic needs.
This can serve as a precursor for bringing balsamic into production. And create necessary user stories out of the thread and discussion.