bioinfo-chru-strasbourg / STARK

STARK is a Next-Generation Sequencing data analysis pipeline for clinical diagnosis
GNU Affero General Public License v3.0
5 stars 0 forks source link

Launching the analysis with STARK #15

Open Nour-EddineS opened 1 year ago

Nour-EddineS commented 1 year ago

Dear @antonylebechec,

I tried to analyze a single BAM file, by the following command: STARK --application=GERMLINE --reads=2336.bam --analysis_name=MyFirstAnalysis --design=target_new.bed --results=STARK/output/results --debug But the order takes more than 22 hours and is not finished yet!! Is this normal ?!. Knowing that my configuration of my station is: CPU: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz RAM: 16 GB Distribution: Ubuntu 22.04.1 LTS The output of the command is:

####################################### # STARK # # Stellar Tools for variants # # Analysis and RanKing # # Author: Antony Le Bechec # # Copyright: HUS # # License: GNU GPLA V3 # #######################################

####################################### # Release: 0.9.18.4 # # Date: 20220720 # #######################################

[INFO] Search Application 'GERMLINE'

[INFO] Application 'GERMLINE' found ('/STARK/tools/stark/0.9.18.4/config/apps/GERMLINE.app')

[INFO] Check Samples Analysis

[INFO] Input File '2336.bam' exists

[INFO] Start Samples Analysis

[INFO] /STARK/tools/stark/0.9.18.4/bin/STARK.launch --application=GERMLINE --reads=2336.bam --analysis_name=MyFirstAnalysis --design=target_new.bed --results=STARK/output/results --debug

####################################### # LaunchSample [0.9.7.2-11/11/2021] # Launch a Sample Analysis # Antony Le Bechec @ IRC © GNU-AGPL ####################################### #[DEBUG] F=/STARK/data/2336.bam B=target_new.bed

[DEBUG] FASTQ= /STARK/data/2336.bam

[DEBUG] Format of input file '/STARK/data/2336.bam' ok

[INFO] FOLDER_REPOSITORY=/STARK/output/repository

[INFO] FOLDER_ARCHIVES=/STARK/output/archives

[INFO] FOLDER_FAVORITES=

[INFO] REPOSITORY=

[INFO] ARCHIVES=

[INFO] FAVORITES=

[INFO] REPOSITORY_FILE_PATTERNS= $SAMPLE.tag $SAMPLE..bam.metrics/$SAMPLE..validation.flags.Design.bed $SAMPLE..bam.metrics/$SAMPLE..validation.flags.Panel.bed $SAMPLE..validation.bam $SAMPLE..validation.bam.bai $SAMPLE.reports/$SAMPLE.final.Panel.tsv $SAMPLE.reports/$SAMPLE.final.Panel.vcf.gz $SAMPLE.reports/$SAMPLE.final.Panel.vcf.gz.tbi $SAMPLE.reports/$SAMPLE.final.vcf.gz $SAMPLE.reports/$SAMPLE.final.vcf.gz.tbi $SAMPLE.reports/$SAMPLE.full.Design.tsv $SAMPLE.reports/$SAMPLE.full.Design.vcf.gz $SAMPLE.reports/$SAMPLE.full.Design.vcf.gz.tbi $SAMPLE.reports/..config $SAMPLE.reports/.report.html $SAMPLE.reports/.report.html.folder:FOLDER

[INFO] ARCHIVES_FILE_PATTERNS= $SAMPLE.genes $SAMPLE.tag $SAMPLE.transcripts $SAMPLE..bam.metrics/$SAMPLE..validation.flags..bed $SAMPLE..bam.metrics/$SAMPLE..validation.flags.Panel.bed $SAMPLE.analysis.json $SAMPLE.archive.cram $SAMPLE.archive.cram.crai $SAMPLE.bed $SAMPLE.manifest $SAMPLE.reports/$SAMPLE.final.Panel.vcf.gz $SAMPLE.reports/$SAMPLE.final.Panel.vcf.gz.tbi $SAMPLE.reports/$SAMPLE.final.tsv $SAMPLE.reports/$SAMPLE.final.vcf.gz $SAMPLE.reports/$SAMPLE.final.vcf.gz.tbi $SAMPLE.reports/$SAMPLE.full.vcf.gz $SAMPLE.reports/$SAMPLE.full.vcf.gz.tbi $SAMPLE.reports/..config $SAMPLE.reports/.report.html $SAMPLE.reports/.report.html.folder:FOLDER

[INFO] FAVORITES_FILE_PATTERNS=

FASTQ /STARK/data/2336.bam FASTQ_R2 SAMPLE 2336 RUN MyFirstAnalysis BED target_new.bed GENES TRANSCRIPTS PEDIGREE INPUT /STARK/data OUTPUT RESULTS STARK/output/results PIPELINES bwamem.gatkHC_GERMLINE.howard bwamem.gatkUG_GERMLINE.howard

[INFO] *** Start Analysis [Tue Jan 31 10:56:13 UTC 2023]

[INFO] *** Input

[INFO] Check Input Files...

[INFO] RUN 'MyFirstAnalysis'

[DEBUG] /STARK/data/2336.bam | | | | | 2336 | MyFirstAnalysis | target_new.bed | | | | bwamem.gatkHC_GERMLINE.howard bwamem.gatkUG_GERMLINE.howard | /STARK/data | STARK/output/results | STARK/output/results/MyFirstAnalysis/2336

[INFO] RUN 'MyFirstAnalysis' - SAMPLE '2336'

[INFO] SAMPLE 'MyFirstAnalysis/2336' from file(s):

[INFO] Read1: /STARK/data/2336.bam

[INFO] Read2:

[INFO] Index1:

[INFO] Index2:

[INFO] Others:

[INFO] Design: target_new.bed

[INFO] Panels:

[INFO] Transcripts:

[INFO] Pedigree:

[INFO] Tags:

[INFO] Create Input data from BAM/CRAM/SAM file

[INFO] FASTQ processing (Adaptors, UMIs, quality)

[INFO] Copy original BED file.

[INFO] Sort/Merge/Normalize BED file.

[INFO] Create LIST.GENES file 'STARK/output/results/MyFirstAnalysis/2336/2336.list.genes' from Design file 'target_new.bed'.

[INFO] Create LIST.GENES file 'STARK/output/results/MyFirstAnalysis/2336/2336.list.genes' with intersection between Design file 'target_new.bed' and RefSeq '/STARK/databases/refGene/current/refGene.hg19.bed'.

[INFO] Generate genes.bed from .genes files within LIST.GENES file 'STARK/output/results/MyFirstAnalysis/2336/2336.list.genes'.

[INFO] Create TAG file.

[INFO] Create Analysis TAG file.

[INFO] Copy SampleSheet.

[INFO] *** Configuration

[INFO] * ANALYSIS

[INFO] ANALYSIS NAME MyFirstAnalysis

[INFO] ANALYSIS TAG

[INFO] * SAMPLES

[INFO] SAMPLE NAMES 2336

[INFO] SAMPLE TAG

[INFO] FASTQ/BAM/CRAM /STARK/data/2336.bam

[INFO] FASTQ R2

[INFO] INDEX1

[INFO] INDEX2

[INFO] OTHER_FILES

[INFO] DESIGN target_new.bed

[INFO] GENES STARK/output/results/MyFirstAnalysis/2336/2336.from_design.genes

[INFO] TRANSCRIPTS

[INFO] PEDIGREE

[INFO] * APPLICATION

[INFO] APPLICATION NAME GERMLINE:1.0

[INFO] APPLICATION FILE GERMLINE.app

[INFO] GROUP GENETIC

[INFO] PROJECT GERMLINE

[INFO] PIPELINES bwamem.gatkHC_GERMLINE.howard bwamem.gatkUG_GERMLINE.howard

[INFO] POST SEQUENCING

[INFO] POST ALIGNMENT sorting markduplicates realignment recalibration compress

[INFO] POST CALLING

[INFO] POST ANNOTATION

[INFO] RESULTS STARK/output/results

[INFO] REPOSITORY

[INFO] ARCHIVES

[INFO] FAVORITES

[INFO] RELEASE INFOS STARK/output/results/MyFirstAnalysis/STARK.20230131-105613.analysis.release

[INFO] MAKEFILE CONFIGURATION STARK/output/results/MyFirstAnalysis/STARK.20230131-105613.analysis.param.mk

[INFO] LOGFILE STARK/output/results/MyFirstAnalysis/STARK.20230131-105613.analysis.log

[INFO] THREADS 7

[INFO] THREADS_BY_SAMPLE 7

[INFO] THREADS_COPY 1

[INFO] *** Process

[INFO] STARK Input Processing...

[INFO] Process Input data from FASTQ file(s) - multithreading mode [7]

[INFO] STARK Input Processing done.

[INFO] STARK Analysis Processing...

Best regards,

antonylebechec commented 1 year ago

Dear @Nour-EddineS,

First of all, your command seems to be healthy.

Then, the execution time of your command depends on the input data (e.g. BAM, BED) and resources (e.g. CPU, MEM, network, disk).

About input data, can you provide:

About resources:

Furthermore, the GERMLINE application is configured with 1 aligner (bwamem) and 2 callers (gatkHC_GERMLINE and gatkUG_GERMLINE). Especially GATKUG (Unified Genotyper) takes a while and is deprecated. I suggest you use the default application which is configured with only 1 caller (gatkHC, Haplotype Caller). You can also configure your own application if needed.

Also remember that STARK generates metrics, annotations, reports and other information that are useful for data interpretation, but will increase execution time, compared to a simple pipeline (alignment and calling).

Finally, to check if your analysis is really healthy, can you have a look at the processes and generated data?

Hope this helps!

Best regards,