Closed vinegang closed 1 year ago
Error while processing test samples through ec2 batch instance.
nxf-scratch-dir ip-10-209-132-38.nci.nih.gov:/tmp/nxf.VZH53uAngQ download failed: s3://ccr-genomics-testdata/References/GRCh37/annotation/hg19_PCG_042616.txt to annotation/hg19_PCG_042616.txt ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer')) download failed: s3://ccr-genomics-testdata/References/GRCh37/annotation/hg19_caddindel.txt to annotation/hg19_caddindel.txt ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer')) download failed: s3://ccr-genomics-testdata/References/GRCh37/annotation/hg19_cadd.txt to annotation/hg19_cadd.txt ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
Proposed solution by kevin: to modify the maxParallelTransfers in the nextflow config see the below github issue. https://github.com/nextflow-io/nextflow/issues/1107
Had a quick call with Kevin, adding errorStrategy = retry & aws.batch.maxParallelTransfers = 5
to the pipeline and testing on ec2
Added hg_refGene.txt file to s3 bucket and testing the annovar process on ec2
Added --strandedness option to the RSEM process
Completed testing the MVP end to end on ec2 instance.
Pushed rsem with --strandedness option and completing testing on biowulf and aws f180c2664bc6a81ec1ec470576a746ae0355408a
Notes on CCDI data:
MCI data is all Exome & Methylation Matched Data - WGS, Exome and RNAseq
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
Sample Count | Patient ID | Library ID | Diagnosis | Sequencing methods | Reason for selection | DNA sequencing -- | -- | -- | -- | -- | -- | -- 1 | SJ031111 | SJEPD031111_D1_T_H2HL5BBXY | Ependymomal Tumor | ribozero | deep sequencing | no 2 | SJ030446 | SJMB030446_D1_T_HK7CFBBXX | Large Cell/Anaplastic Medulloblastoma | ribozero | deep sequencing | no 3 | PANBMJ | NBL_PANBMJ_T1R_T_C1NELACXX | neuroblastoma | polya | deep sequencing, DHH-RHEBL1 fusion | exome 4 | PANKFE | NBL_PANKFE_T1R_T_81NC5ABXX | neuroblastoma | polya | deep sequencing, DHH-RHEBL1 fusion | exome 5 | TC248 | TC248seq_T_D2AHCACXX | Ewing sarcoma | ribozero | EWSR1-FLI1 | exome 6 | EWS104 | EWS104tumor_T_C1NELACXX | Ewing sarcoma | polya | EWSR1-FLI1 | exome 7 | NCI0064 | NCI0064tumor_T_C291CACXX | rhabdomyosarcoma | ribozero | PAX3-FOXO1 | exome 8 | RMS248 | RMS248_C14C7ACXX | rhabdomyosarcoma | polya | PAX3-FOXO1 | exome 9 | NCI0243 | NCI0243_T_T_HHC2JBGXX | Osteosarcoma | polya_stranded | | exome 10 | NCI0296 | NCI0296_T1R_T_H5VGLBGXY | Desmoplastic small round cell tumor | polya_stranded | EWSR1-WT1 | exome 11 | NCI0263 | NCI0263_T4R_T_HWMY2BGXX | Melanoma | polya_stranded | | exome 12 | CL0263 | CL0263_T1R_T_H7WNMBGXB | rhabdomyosarcoma | access | PAX3-FOXO1 | exome 13 | NCI0246 | NCI0246_T2R_T2_HCNWGBGX7 | Methothelioma peritoneal | access | STRN-ALK | exome 14 | NCI0246 | NCI0246_T1R_T_H5VGLBGXY | Methothelioma peritoneal | polya_stranded | STRN-ALK | exome 15 | CL0187 | CL0187_T1R_T2_HCNMNBGX7 | Endometrial stromal sarcoma | access | JAZF1-SUZ12 | exome 16 | CHLADSRCTII | CHLADSRCTII_T1R_T_HCY3KBGXG | Desmoplastic small round cell tumor | access | EWSR1-WT1 | exome 17 | PATADR | RMS2163_T_HKY3VBGX5 | rhabdomyosarcoma | SmartRNA | PAX3-FOXO1 | no 18 | RMS2163 | RMS2163_T1R_T3_HCNMNBGX7 | rhabdomyosarcoma | access | PAX3-FOXO1 | no 19 | RMS2207 | RMS2207_T2R_T2_HKWGGBGX5 | rhabdomyosarcoma | access | PAX7-FOXO1 | panel 20 | RMS2207 | RMS2207_T_HKY3VBGX5 | rhabdomyosarcoma | SmartRNA | PAX7-FOXO1 | panel | | | | | |