Closed GACGAMA closed 8 months ago
Does the bam file has contigs (i.e., chr1_KI270766v1_alt
) that does not exist in the genome reference file?
One possible get around is to create a bed file with only the real chromosomes and pass it before the paired option as --inclusion-region
.
I will verify, but I'm using the reference file and bam files from SEQ2C recomendation, so it should be fine. I will be testint with the High-Confidence_Regions_v1.2.bed provided tough
This is genome reference SEQC2 used: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/technical/reference_genome/GRCh38/ Which bam files did you use?
Im using the following tumor BAM files, with their respective paired normal
FFG_GZ_T_24h-B
FFG_GZ_T_2h-A
FFG_IL_T_1h
FFG_IL_T_2h
FFG_GZ_T_24h-F
FFG_GZ_T_6h-A
FFG_IL_T_24h
Rerunning files with -include region High-Confidence_Regions_v1.2.bed did work. Still, FFPE files had to be renamed with piccard to work.
Hello! I'm trying to train the AI model with different inputs such as WGS, TITRATION, SYNTHETIC and FFPE. Im running SomaticSeq on a slurm cluster with singularity wherever possible.
I have had many errors with FFPE samples because the BAM files were not named correctly according to normal-tumor pairs. To correct this, I used:
picard AddOrReplaceReadGroups I=/data/nsobrei2/ggama1/training/bams/FFPE/{1}.bwa.dedup.bam O=/data/nsobrei2/ggama1/training/bams/FFPE_reheader/{1}.bwa.dedup.bam RGID={1} RGLB=na RGPL=ILLUMINA RGPU=na RGSM={1} RGDS={1}.bwa.dedup.bam
Which is basically adding the file name as sample name into read groups. All mutation callers did work without any errors.
But when creating the consensus
somaticseq_parallel.py --output-directory \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/consensus_vcf --genome-reference \$HUMAN_REFERENCE_PATH --dbsnp \$DBSNP_PATH --threads \$THREADS --truth-snv \$TRUTH_SNV --truth-indel \$TRUTH_INDEL paired --tumor-bam-file \$filepathsstarR1 --normal-bam-file \$filepathsstarR2 --mutect2-vcf \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/mutect2/MuTect2.vcf.gz --vardict-vcf \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/vardict/VarDict.vcf.gz --somaticsniper-vcf \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/somaticsniper/SomaticSniper.vcf --muse-vcf \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/muse/MuSE.vcf.gz --strelka-snv \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/strelka2/Strelka.snv.vcf.gz --strelka-indel \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/strelka2/Strelka.indel.vcf.gz --varscan-snv \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/varscan2/VarScan2.snv.vcf.gz --varscan-indel \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/varscan2/VarScan2.indel.vcf.gz --lofreq-snv \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/lofreq/LoFreq.snv.vcf.gz --lofreq-indel \$BASE_PATH/somaticseq/vcf_per_sample/$EXPERIMENT_PATH/\$filenames/lofreq/LoFreq.indel.vcf.gz
This is running, but then gives me the following error for all samples from FFPE (SYNTHETIC did work so the program itself is running ok):