Sentieon / sentieon-google-genomics

Run Sentieon pipelines on Google Cloud Platform
https://cloud.google.com/genomics/docs/tutorials/sentieon
BSD 2-Clause "Simplified" License
6 stars 7 forks source link

How to run this Sentieon DNAseq in GCP #1

Closed Rokshan2016 closed 4 years ago

Rokshan2016 commented 4 years ago

HI @DonFreed

I tried this pipeline https://cloud.google.com/life-sciences/docs/tutorials/sentieon from my local to GCP . It works well. But I want to run this from GCP . Can you please guide me on this?

Thanks! Rokshan

DonFreed commented 4 years ago

Hi Rokshan,

I'm glad you are having a good experience with the pipeline!

When you launch the pipeline from your local machine, most of the computation is actually run on the GCP. After starting the job on your local machine you can track the launched job from the Life Sciences and Compute Engine dashboards.

-Don

Rokshan2016 commented 4 years ago

@DonFreed I want to run it from GCP , e.g GKE or data fusion . Is it possible?

DonFreed commented 4 years ago

This repository runs the Sentieon pipelines using the GCP's Life Sciences API.

You can launch these Life Sciences pipelines from other GCP offerings (such as GCE or GKE). For example, it is possible to launch the pipeline from a standard virtual machine running on the Google Compute Engine (GCE). Just follow Google's documentation to create and start a small virtual machine, ssh into the new instance and then follow the Sentieon tutorial to setup the instance and launch the pipeline(s). The pipeline will launch from the GCE instance but the computation will be handled by the Life Sciences API.

You could also run Sentieon DNAseq outside of the Life Sciences API. I'm not too familiar with data fusion, but it is possible to run the Sentieon DNAseq software on top of other GCP offerings (such as GKE). Please feel free to send us a message on our support page if you want to run the Sentieon software on the Google Cloud Platform outside of the Life Sciences API.

Rokshan2016 commented 4 years ago

@DonFreed Thanks for the information. I will try to deploy it in GKE.

Rokshan2016 commented 4 years ago

@DonFreed Is there any tool in GCP that we can used to create a report out of VCF ?

DonFreed commented 4 years ago

This functionality is not available the Sentieon software, which is more focused on the computationally intensive portion of bioinformatics pipelines (usually FASTQ to VCF).

You might try searching/asking in some other forums such as Biostars for suggested tools that can be used to generate a report from a VCF.

Rokshan2016 commented 4 years ago

@DonFreed I was trying to run this in GKE . Getting some error.

I created a docker image out of the dockerfile in pipeline script directory. When I deploy the docker in GKE , it is not deploying . Any suggestion on this? Do you have a sample , how to deploy it in GKE?

Normal Pulling 12m (x5 over 14m) kubelet, gke-mllp-adapter-default-pool-86e81d4d-1tvg pulling image "gcr.io/project-id/sention" Warning BackOff 4m18s (x47 over 13m) kubelet, gke-mllp-adapter-default-pool-86e81d4d-1tvg Back-off restarting failed container

DonFreed commented 4 years ago

Hi @Rokshan2016,

I'm happy to help you run the Sentieon tools on GKE, but that's out of the scope of this project, which is geared towards running the Sentieon tools using the Google Life Sciences API. Please send a message to us on our support page.