ga4gh / fasp-scripts

Apache License 2.0
11 stars 7 forks source link

Use CloudOS and NextFlow to integrate TCGA and GTEx data #14

Open ianfore opened 3 years ago

ianfore commented 3 years ago

See GTEX_TCGA_Federated_Analysis notebook for an iPython workflow.

The overall flow of the script/notebook is probably more illustrative that directly usable in NextFlow. A good question is how to do the equivalent work from CloudOS and NextFlow where that is the platform of preference e.g. as it is for the JAX team. If it is of interest to you to explore that we could work together to get started.

I have no experience of writing NextFlow, I’ll leave that to you. Nevertheless, I was able to get a sense from the nf script that Sangram had shared previously of how you structure things. https://github.com/lifebit-ai/sra-dbgap-datafetch/blob/main/main.nf

One of the questions is which capabilities it makes sense to do from with NextFlow, and where those capabilities should be in some library which NextFlow calls. Either way, I think we could do some useful experimentation. Some of the fasp package may be useful, if not I could modify it to help.

See #15 for a possible additional step.

sk-sahu commented 3 years ago

Hi @ianfore

For JAX research team, getting the GTEx data from AnVIL-GEN3 we made a work around after getting the signed URLs from modified get_drs_url.py (from fasp-scripts) in the Nextflow.

An extrapolated example Nextflow pipeline can be found here - https://github.com/lifebit-ai/drs-nf

Although it works, but in teams of code design which transit from fasp-scripts to Nextflow is not perfect, we can take this as an action point.

ianfore commented 3 years ago

Thanks @sk-sahu for the links. I'd highlight what seem to me some key issues.

To my mind those would fit your wish towards better code design. You may have others.

What I would advocate more strongly is that, rather in theory, we all explore these issues in working code examples, hackathon style!

ianfore commented 3 years ago

Possible to dos: