cancerit / dockstore-cgpwgs

Dockstore implementation of CGP core WGS analysis
GNU Affero General Public License v3.0
30 stars 14 forks source link

Error when running under singularity #63

Closed calvinhuapeng closed 3 years ago

calvinhuapeng commented 3 years ago

Hello, I tried to run cgpwgs under singularity but an error occured in the step of:

bash -c '/usr/bin/time -v brass.pl -j 4 -k 4 -c 48  -d /var/spool/results/reference_files/brass/HiDepth.bed.gz  -f /var/spool/results/reference_files/brass/brass_np.groups.gz  -g /var/spool/results/reference_files/genome.fa  -s '\''Human'\'' -as NCBI38 -pr WGS -pl ILLUMINA  -g_cache /var/spool/results/reference_files/vagrent/vagrent.cache.gz  -vi /var/spool/results/reference_files/brass/viral.genomic.fa.2bit  -mi /var/spool/results/reference_files/brass/all_ncbi_bacteria  -b /var/spool/results/reference_files/brass/500bp_windows.gc.bed.gz  -ct /var/spool/results/reference_files/brass/CentTelo.tsv  -cb /var/spool/results/reference_files/brass/cytoband.txt  -t /var/spool/results/tmp/TUM1.bam  -n /var/spool/results/tmp/WT1.bam  -o /var/spool/results/WGS_TUM1_vs_WT1/brass  -p input >& /var/spool/results/timings/WGS_TUM1_vs_WT1.time.BRASS_input ; echo '\''WRAPPER_EXIT: '\''$?'
ERRORS OCCURRED:
/var/spool/results/ascat.wrapper.log
/var/spool/results/BRASS_input.wrapper.log

>cat ascat.wrapper.log 
WRAPPER_EXIT: 1

> cat tmpAscat/logs/Sanger_CGP_Ascat_Implement_ascat.0.err
...
Error in apply(corr_tot, 1, function(x) sum(abs(x * length_tot))/sum(length_tot)) : 
  dim(X) must have a positive length
Calls: ascat.GCcorrect -> apply
Execution halted
Command exited with non-zero status 1

Wondering if anyone can help with it? Thanks.

(repo owner edit: corrected formatting)

keiranmraine commented 3 years ago

Please provide the command used to execute the container.

At first glance the ASCAT error is indicative of the tumour/control not being matched data. Some other causes could be:

  1. Are reference files matched to the data (37 vs 38), you appear to specify NCBI38.
  2. Are the contig name formats matched between BAM/CRAM and the reference files (chr prefix?)
calvinhuapeng commented 3 years ago

Please provide the command used to execute the container.

At first glance the ASCAT error is indicative of the tumour/control not being matched data. Some other causes could be:

1. Are reference files matched to the data (37 vs 38), you appear to specify NCBI38.

2. Are the contig name formats matched between BAM/CRAM and the reference files (chr prefix?)

Thanks for the reply. Both samples were mapped against 38 using cgpmap_3.2.0 and the command for executing cgpwgs is as following: singularity exec \ --cleanenv \ --workdir ./01_wgs \ --home ./01_wgs \ --bind ./ref_38:/var/spool/ref:ro \ --bind ./00_mapping:/var/spool/data:ro \ --bind ./01_wgs:/var/spool/results \ dockstore-cgpwgs_2.1.0.sif \ ds-cgpwgs.pl \ -r /var/spool/ref/core_ref_GRCh38_hla_decoy_ebv.tar.gz \ -a /var/spool/ref/VAGrENT_ref_GRCh38_hla_decoy_ebv_ensembl_91.tar.gz \ -si /var/spool/ref/SNV_INDEL_ref_GRCh38_hla_decoy_ebv-fragment.tar.gz \ -cs /var/spool/ref/NV_SV_ref_GRCh38_hla_decoy_ebv_brass6+.tar.gz \ -qc /var/spool/ref/qcGenotype_GRCh38_hla_decoy_ebv.tar.gz \ -pl 3.65 -pu 1.0 \ -e 'MT,GL%,hs37d5,NC_007605' \ -t /var/spool/data/TUM1.bam \ -tidx /var/spool/data/TUM1.bam.bai \ -n /var/spool/data/WT1.bam \ -nidx /var/spool/data/WT1.bam.bai \ -o /var/spool/results

keiranmraine commented 3 years ago

Are the inputs full genome analysis? This isn't something we would expect to see unless pulldown or targeted data is in use (which cannot be analysed with this flow).

There should be 2 files matching ls tmpAscat/ascat/*.count. Please provide the result of the following, replacing TUM1/WT1 with path to the above files:

echo -n 'TUM1 zero:' && grep -cP '\t0$' $TUM1.count
echo -n 'TUM1 data:' && grep -cvP '\t0$' $TUM1.count
echo -n ' WT1 zero:' && grep -cP '\t0$' $WT1.count
echo -n ' WT1 data:' && grep -cvP '\t0$' $WT1.count
calvinhuapeng commented 3 years ago

Are the inputs full genome analysis? This isn't something we would expect to see unless pulldown or targeted data is in use (which cannot be analysed with this flow).

There should be 2 files matching ls tmpAscat/ascat/*.count. Please provide the result of the following, replacing TUM1/WT1 with path to the above files:

echo -n 'TUM1 zero:' && grep -cP '\t0$' $TUM1.count
echo -n 'TUM1 data:' && grep -cvP '\t0$' $TUM1.count
echo -n ' WT1 zero:' && grep -cP '\t0$' $WT1.count
echo -n ' WT1 data:' && grep -cvP '\t0$' $WT1.count

Thanks Kerian. Yes, both are whole genome sequncing data. The raw pair-end fastq files were trimed adptor and flash merged into single read before mapping. echo -n 'TUM1 zero:' && grep -cP '\t0$' TUM1.count TUM1 zero:1850603 echo -n 'TUM1 data:' && grep -cvP '\t0$' TUM1.count TUM1 data:1 echo -n ' WT1 zero:' && grep -cP '\t0$' WT1.count WT1 zero:1850603 echo -n ' WT1 data:' && grep -cvP '\t0$' WT1.count WT1 data:1

keiranmraine commented 3 years ago

All of the tools are expecting traditional paired-end inputs, not merged reads. The data above shows that no reads have been used to generate allele counts as they are not paired end alignments.

The tools are fragment aware and won't double count due to overlapping reads.