CCRGeneticsBranch / Oncogenomics_NF_WF

https://ccrgeneticsbranch.github.io/Oncogenomics_NF_WF/
0 stars 1 forks source link

Nextflow MVP #12

Closed vinegang closed 1 year ago

vinegang commented 1 year ago
vinegang commented 1 year ago
  1. Completed 95% of the MVP on Biowulf.
  2. Pending items:
  1. Current status: As the sample size of test samples(Test1, Test2,Test3 ) are tiny, most of the outputs are empty. Testing MVP on biowulf with Sample CL0086 to make sure all the steps are being executed correctly.
  2. Parallel testing the MVP on EC2 instance
vinegang commented 1 year ago

Error while processing test samples through ec2 batch instance.

nxf-scratch-dir ip-10-209-132-38.nci.nih.gov:/tmp/nxf.VZH53uAngQ download failed: s3://ccr-genomics-testdata/References/GRCh37/annotation/hg19_PCG_042616.txt to annotation/hg19_PCG_042616.txt ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer')) download failed: s3://ccr-genomics-testdata/References/GRCh37/annotation/hg19_caddindel.txt to annotation/hg19_caddindel.txt ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer')) download failed: s3://ccr-genomics-testdata/References/GRCh37/annotation/hg19_cadd.txt to annotation/hg19_cadd.txt ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

Proposed solution by kevin: to modify the maxParallelTransfers in the nextflow config see the below github issue. https://github.com/nextflow-io/nextflow/issues/1107

vinegang commented 1 year ago

Had a quick call with Kevin, adding errorStrategy = retry & aws.batch.maxParallelTransfers = 5 to the pipeline and testing on ec2

vinegang commented 1 year ago

Added hg_refGene.txt file to s3 bucket and testing the annovar process on ec2

vinegang commented 1 year ago

Added --strandedness option to the RSEM process

vinegang commented 1 year ago

Completed testing the MVP end to end on ec2 instance.

vinegang commented 1 year ago

Pushed rsem with --strandedness option and completing testing on biowulf and aws f180c2664bc6a81ec1ec470576a746ae0355408a

vinegang commented 1 year ago

Notes on CCDI data:

MCI data is all Exome & Methylation Matched Data - WGS, Exome and RNAseq

vinegang commented 1 year ago

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Sample Count | Patient ID | Library ID | Diagnosis | Sequencing methods | Reason for selection | DNA sequencing -- | -- | -- | -- | -- | -- | -- 1 | SJ031111 | SJEPD031111_D1_T_H2HL5BBXY | Ependymomal Tumor | ribozero | deep sequencing | no 2 | SJ030446 | SJMB030446_D1_T_HK7CFBBXX | Large Cell/Anaplastic Medulloblastoma | ribozero | deep sequencing | no 3 | PANBMJ | NBL_PANBMJ_T1R_T_C1NELACXX | neuroblastoma | polya | deep sequencing, DHH-RHEBL1 fusion | exome 4 | PANKFE | NBL_PANKFE_T1R_T_81NC5ABXX | neuroblastoma | polya | deep sequencing, DHH-RHEBL1 fusion | exome 5 | TC248 | TC248seq_T_D2AHCACXX | Ewing sarcoma | ribozero | EWSR1-FLI1 | exome 6 | EWS104 | EWS104tumor_T_C1NELACXX | Ewing sarcoma | polya | EWSR1-FLI1 | exome 7 | NCI0064 | NCI0064tumor_T_C291CACXX | rhabdomyosarcoma | ribozero | PAX3-FOXO1 | exome 8 | RMS248 | RMS248_C14C7ACXX | rhabdomyosarcoma | polya | PAX3-FOXO1 | exome 9 | NCI0243 | NCI0243_T_T_HHC2JBGXX | Osteosarcoma | polya_stranded |   | exome 10 | NCI0296 | NCI0296_T1R_T_H5VGLBGXY | Desmoplastic small round cell tumor | polya_stranded | EWSR1-WT1 | exome 11 | NCI0263 | NCI0263_T4R_T_HWMY2BGXX | Melanoma | polya_stranded |   | exome 12 | CL0263 | CL0263_T1R_T_H7WNMBGXB | rhabdomyosarcoma | access | PAX3-FOXO1 | exome 13 | NCI0246 | NCI0246_T2R_T2_HCNWGBGX7 | Methothelioma peritoneal | access | STRN-ALK | exome 14 | NCI0246 | NCI0246_T1R_T_H5VGLBGXY | Methothelioma peritoneal | polya_stranded | STRN-ALK | exome 15 | CL0187 | CL0187_T1R_T2_HCNMNBGX7 | Endometrial stromal sarcoma | access | JAZF1-SUZ12 | exome 16 | CHLADSRCTII | CHLADSRCTII_T1R_T_HCY3KBGXG | Desmoplastic small round cell tumor | access | EWSR1-WT1 | exome 17 | PATADR | RMS2163_T_HKY3VBGX5 | rhabdomyosarcoma | SmartRNA | PAX3-FOXO1 | no 18 | RMS2163 | RMS2163_T1R_T3_HCNMNBGX7 | rhabdomyosarcoma | access | PAX3-FOXO1 | no 19 | RMS2207 | RMS2207_T2R_T2_HKWGGBGX5 | rhabdomyosarcoma | access | PAX7-FOXO1 | panel 20 | RMS2207 | RMS2207_T_HKY3VBGX5 | rhabdomyosarcoma | SmartRNA | PAX7-FOXO1 | panel   |   |   |   |   |   |  

vinegang commented 1 year ago

Added multiqc and completed issue 26

Tested the pipeline on biowulf and AWS

vinegang commented 1 year ago

Completed #8 Now, working on adding sampleID to the output path

vinegang commented 1 year ago

Added casename as a input parameter and updated the publishdir path for all the rules.

vinegang commented 1 year ago

Hsien Chao generated the master file for our CCDI project: s3://ccr-oncogenomics-ccdi-dev/metadata/master_file.tsv I will update this with the 20 sample information

vinegang commented 1 year ago
vinegang commented 1 year ago
vinegang commented 1 year ago

completed processing two samples on AWS. Launching the pipeline on remaining samples

vinegang commented 1 year ago

Updated the memory resources for AWS and launched the pipeline on remaining samples

vinegang commented 1 year ago

Access error at fastqc step, reached out to Kevin, waiting for more information from him.

Error executing process > 'Fastqc (NCI0296)'
--
Caused by:
Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 9FKXR9DZRP4CFDMX; S3 Extended Request ID: m3fSxI9wCtEE8SkLAYBjnDJyNNULhXetKt5tX0XCvDl0GOg4+b3nE59Jnw9iMOG4I9Jf6PA0rpQ=; Proxy: null)
vinegang commented 1 year ago

I had a call with Kevin from AWS, it seems we dont have permissions to write pipeline results to s3://ccr-oncogenomics-ccdi-dev/. This is a AWS limitation, he suggested we move processed_data directory to agc-424336837382-us-east-1. This will allow me to process the samples n AWS

vinegang commented 1 year ago

Kevin suggested we can mount the AGC s3 bucket for processed results. Meanwhile I copied the results for two samples to s3://ccr-oncogenomics-ccdi-dev/ for visualization.

vinegang commented 1 year ago

-----w---- 1 kopardevn     khanlab     58403393 Jan 20 12:16 1000gSites4genotyping.v2.bed  

/vf/users/Clinomics/Ref/khanlab/annotation/1000gSites4genotyping.v2.bed

@kopardev can you add read permissions to the group 

vinegang commented 1 year ago

Goals of the project changed. We decided to expand MVP to exome pipeline. closing the issue