Clinical-Genomics / cg

Glue between Clinical Genomics apps
8 stars 2 forks source link

Draft issue for cg start balsamic #402

Closed hassanfa closed 4 years ago

hassanfa commented 5 years ago

[draft for cg start balsamic]

I thought I should create this following @patrikgrenfeldt 's suggestion on having a cg start balsamic. I'd appreciate if cg owners and ttn members can share here what cg needs and I can share what balsamic needs.

This can serve as a precursor for bringing balsamic into production. And create necessary user stories out of the thread and discussion.

ingkebil commented 5 years ago

There is documentation on what constitutes what needs to go into either an app or a meta api here: https://github.com/Clinical-Genomics/cg/tree/master/documentation

That manual was a first draft for the development manual and I would stick to parts of cg. All other things are not reliable ;)

I would also suggest to rename the command to cg analysis balsamic start as to group all pipeline specific commands in one sub command.

hassanfa commented 5 years ago

Cool! I will read that. Is it the same way for mip and usalt? Cause if we have balsamic, usalt and mip in up, the command should be available within the same environment, right?

Balsamic has a step before start, which is creating analysis config. Then uses the analysis config to start analysis. For balsamic, test specification is planned to be ready sometime in September. If you are interested, I can share the draft for that on AMsystems.

ingkebil commented 5 years ago

The documentation was made before we were putting multiple pipelines in production and is very MIP centric. It does explain how the apps and meta apis work and how you're supposed to use them. It goes over all the apps and meta apis/interfaces that were in production beginning 2018 and describes them as well how they tie everything together.

MIP also has steps before start. We handle them as follows: there is one sub command for each step (e.g. panels, config, link). Each sub command does just that: only that subcommand. We have one aggregate subcommand called auto that will execute all steps in the right order (e.g. panels, config, link, and then start). So, it would be no problems to create a cg analysis balsamic config and cg analysis balsamic start as well as a cg analysis balsamic auto which would start all cases that are ready. Don't worry about the auto subcommand for now tho. We do have a cg analysis auto which starts all pending cases for all pipelines, so that is the one to concentrate on.

One extra note: the apps are all still importing the python API of the tool they are interfacing with. We have found that maintaining these apps interfaces is ... error prone. We are now trying out if we can interface with the tools' CLI instead. The only apps interface we have ported to this methodology is loqus.

ingkebil commented 5 years ago

e.g. there already is a balsamic api (for fastq file handling): https://github.com/Clinical-Genomics/cg/tree/master/cg/apps/balsamic

The current CLI cg analysis does not have a balsamic subcommand tho. e.g. the Balsamic linking is handled in cg analysis link and determines where to link automatically for now.

ingkebil commented 5 years ago

Cool! I will read that. Is it the same way for mip and usalt? Cause if we have balsamic, usalt and mip in up, the command should be available within the same environment, right?

What command are you referring to? if cg analysis balsamic [config|start], then yes, this one should be available in cg installed in the _main environment.

hassanfa commented 5 years ago

It might not end well having all of them installed in main env. But we'll see how things change in future. I'm not too keen of worrying about other workflows and packages and their conda dependencies... It can only end bad.

hassanfa commented 5 years ago

I think you guys should consider activating conda env within cg instead. Subshell is already activated via subprocess in python, and it won't collide with others.

An example is to add conda envs to cg-stage.yaml and cg.yaml. Then say for balsamic, add their conda env:

balsamic:
  root: /home/proj/production/cancer/cases
  env: complete_path_to_conda_env_P_BALSAMIC_191231

And just create a conda and shell executor for each API. This will make it easier to add new workflows and packages. This will also address all the rigidity in cg et al.

hassanfa commented 5 years ago

For example:

# create config and any prereq for workflow
cg analysis --config balsamic|mip|microsalt --...rest_of_stuff
# start a workflow:
cg analysis --start balsamic|mip|microsalt --...rest_of_stuff

This will use executors, say:

# a shell module or whatever
shell.config(workflow='balsamic', conda='complete_path_to_conda_env'...)
# a shell module or whatever
shell.start(workflow='balsamic', conda='complete_path_to_conda_env'...)
hassanfa commented 5 years ago

Most of workflow managers have these functionalities. I think we can get inspired from similar solutions. Managing bash functions is a nightmare. I am actually thinking to drop all of them, as they are an extra thing that I have to maintain and creates unnecessary complexity. :-) Of course, someone from prod might decide to take responsibility for those :-p

emiliaol commented 4 years ago

@hassanfa Still relevant?

hassanfa commented 4 years ago

I don't think so.