ga4gh / fasp-scripts

Apache License 2.0
11 stars 7 forks source link

Add script that runs parts of a single workflow with different files on different WES implementations #11

Open mbarkley opened 3 years ago

mbarkley commented 3 years ago

Motivation

A long-standing goal of the FASP GA4GH group has been to have a federated workflow demo that anyone can run. In this context, a federated workflow means:

  1. Multiple compute environments are used for a single computational analysis
  2. Different data is analysed in each compute environment
  3. A single script drives the analysis using GA4GH standards, where possible

Goals

  1. Automate as much of this task as possible in scripts in this repo
  2. Use WES API to control and monitor workflows
  3. Use public or synthetic data to make this more accessible for other people to try

Todo

mbarkley commented 3 years ago

There is a Seven Bridges WES script now thanks to #9

ianfore commented 3 years ago

Great proposal. I think there's even value in running something against more than one Seven Bridges implementation. I'd like to revive this model of looking at a FASP script to highlight which aspects of federation it hits. In this case it would address those in bold. It also has the option that we can test federated authentication and authorization early.

Your list above would check all those.

ianfore commented 3 years ago

Have updated FASPScript17 which runs a compute on TCGA and GTEx data in a single script. These come from different repositories and driver projects.

See GTEX_TCGA_Federated_Analysis notebook

mbarkley commented 3 years ago

There's now a PR sent to fix the DNAstack WES client. Tomorrow I'll start tinkering with an ELIXIR script.

ianfore commented 3 years ago

Merged the PR. Now thinking of tinkering with running samtools via the DNAStack WES. The problem we hit with that last summer seemed to be the "requester pays" buckets.

Added DNAStackWESTour notebook to explore some more.

It looks like we may still have the requester pays problem, but otherwise the server looks in good shape.

mbarkley commented 3 years ago

I've sent PR #19 with an ELIXIR WES client implementation. I don't think I'll get to write a script using all the WES clients this week for a single federated workflow, but I think we're a lot closer now.

ianfore commented 2 years ago

Raising the question whether the Federated VUS notebook addresses the intent of this issue. The link shown is the notebook that aggregates the results. Notebooks in the same folder show running the same workflow on three different instances of the Seven Bridges platform.

This...

  1. Demonstrates the concept and checks the boxes in the 15 Jan 2021 comment above.
  2. Is less convincing than running it on three different technical platforms

Barriers to the latter are

Proposing that we close this issue and address those barriers. Perhaps via issues in this repo which are specific to those barriers, or by other means.

Thoughts @mbarkley ?