PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
258 stars 71 forks source link

Truncated/Malformed GRIDSS VCFs #580

Closed jamesdalg closed 1 year ago

jamesdalg commented 2 years ago

I've had this happen several times on several samples recently. Maybe it's an easy/small thing, but I'm not understanding why GRIDSS is producing truncated files. I've included the GRIDSS log, but it seems like everything ran fine and yet the VCF is still truncated. Not sure why this is or why it keeps happening.

module load gridss samtools R java/17.0.2 bcftools; java -jar /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/gripss/gripss_v2.1.jar -sample PALHRL_T -reference PALHRL_N -ref_genome /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa -pon_sv_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe -pon_sgl_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed -known_hotspot_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/external_resources/HMFTools-Resources/Known-Fusions/38/known_fusions.38.bedpe -vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALHRL_T_N_gridss_paired/gridss_PALHRL_T_N_paired_output_hg38.vcf.gz -output_dir /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALHRL_T_N_gridss_paired/ > gripss_log.txt [-] Unloading gridss 2.13.2 on cn0882 [+] Loading gridss 2.13.2 on cn0882 [-] Unloading singularity 3.8.5-1 on cn0882 [+] Loading singularity 3.8.5-1 on cn0882 [-] Unloading samtools 1.15 ... [+] Loading samtools 1.15 ... [-] Unloading gcc 9.2.0 ... [-] Unloading GSL 2.6 for GCC 9.2.0 ... [-] Unloading openmpi 4.0.5 for GCC 9.2.0 [-] Unloading ImageMagick 7.0.8 on cn0882 [-] Unloading HDF5 1.10.4 [-] Unloading NetCDF 4.7.4_gcc9.2.0 [-] Unloading pandoc 2.17.1.1 on cn0882 [-] Unloading pcre2 10.21 ... [-] Unloading R 4.1.3 [+] Loading gcc 9.2.0 ... [+] Loading GSL 2.6 for GCC 9.2.0 ... [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading openmpi 4.0.5 for GCC 9.2.0 [+] Loading ImageMagick 7.0.8 on cn0882 [+] Loading HDF5 1.10.4 [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading NetCDF 4.7.4_gcc9.2.0 [+] Loading pandoc 2.17.1.1 on cn0882 [+] Loading pcre2 10.21 ... [+] Loading R 4.1.3 [-] Unloading java 17.0.2 ... [+] Loading java 17.0.2 ... [-] Unloading samtools 1.13 ... [+] Loading samtools 1.13 ... Exception in thread "main" htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Premature end of file: /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALHRL_T_N_gridss_paired/gridss_PALHRL_T_N_paired_output_hg38.vcf.gz, for input source: /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALHRL_T_N_gridss_paired/gridss_PALHRL_T_N_paired_output_hg38.vcf.gz at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:97) at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:82) at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:117) at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:81) at com.hartwig.hmftools.gripss.GripssApplication.processVcf(GripssApplication.java:113) at com.hartwig.hmftools.gripss.GripssApplication.run(GripssApplication.java:108) at com.hartwig.hmftools.gripss.GripssApplication.main(GripssApplication.java:336) Caused by: htsjdk.samtools.FileTruncatedException: Premature end of file: [gripss_log.txt](https://github.com/hartwigmedical/hmftools/files/8605946/gripss_log.txt) /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALHRL_T_N_gridss_paired/gridss_PALHRL_T_N_paired_output_hg38.vcf.gz at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:530) at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468) at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458) at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196) at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331) at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:257) at htsjdk.tribble.readers.PositionalBufferedStream.fill(PositionalBufferedStream.java:132) at htsjdk.tribble.readers.PositionalBufferedStream.read(PositionalBufferedStream.java:84) at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270) at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313) at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188) at java.base/java.io.InputStreamReader.read(InputStreamReader.java:177) at htsjdk.tribble.readers.LongLineBufferedReader.fill(LongLineBufferedReader.java:140) at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:300) at htsjdk.tribble.readers.LongLineBufferedReader.readLine(LongLineBufferedReader.java:356) at htsjdk.tribble.readers.SynchronousLineReader.readLine(SynchronousLineReader.java:51) at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:24) at htsjdk.tribble.readers.LineIteratorImpl.advance(LineIteratorImpl.java:11) at htsjdk.samtools.util.AbstractIterator.hasNext(AbstractIterator.java:44) at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:89) at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79) at htsjdk.tribble.AsciiFeatureCode [gridss.full.20220502_151012.cn2414.67676.log](https://github.com/hartwigmedical/hmftools/files/8605970/gridss.full.20220502_151012.cn2414.67676.log) [gridss.timing.20220502_151012.cn2414.67676.log](https://github.com/hartwigmedical/hmftools/files/8605971/gridss.timing.20220502_151012.cn2414.67676.log) c.readHeader(AsciiFeatureCodec.java:37) at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:95)

logs attached: gridss.full.20220502_151012.cn2414.67676.log gridss.timing.20220502_151012.cn2414.67676.log gripss_log.txt

jamesdalg commented 2 years ago

I've replicated this issue with GRIDSS 2.12.2 and 2.13.2. I've also seen this issue (notice the first line in the ALT field, there's a period there and GRIPSS then calls the file malformed):

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample
chr1    10000   gridss0b_1b     N       .AACCCTAACCN    4500.73 NO_SR   AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=2259.04;BASRP=89;BASSR=0;BEID=asm0-27782;BEIDH=-1;BEIDL=10;BMQ=26.25;BMQN=20.00;BMQX=42.00;BQ=4500.73;BSC=0;BSCQ=0.00;BUM=86;BUMQ=2241.69;BVF=92;CAS=0;CASQ=0.00;CQ=4832.73;EVENT=gridss0b_1;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=23;REFPAIR=0;RP=0;RPQ=0.00;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0      GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF        .:0.800:0.00:0:0:0:0.00:0:0.00:2259.04:89:0:4500.73:0:0.00:86:2241.69:92:0.00:0:0.00:0.00:0.00:23:0:0:0.00:0:0.00:0
chr1    10151   gridss0f_3b     T       TTAACCCTAACCC.  422.48  ASSEMBLY_BIAS;LOW_QUAL;NO_RP    AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=390.48;BASRP=16;BASSR=0;BEID=asm0-27779;BEIDH=-1;BEIDL=0;BMQ=35.00;BMQN=32.00;BMQX=38.00;BQ=422.48;BSC=1;BSCQ=32.00;BUM=0;BUMQ=0.00;BVF=17;CAS=0;CASQ=0.00;CQ=2040.61;EVENT=gridss0f_3;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=396;REFPAIR=1513;RP=0;RPQ=0.00;SB=1.0;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0 GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF        .:8.827e-03:0.00:0:0:0:0.00:0:0.00:390.48:16:0:422.48:1:32.00:0:0.00:17:0.00:0:0.00:0.00:0.00:396:1513:0:0.00:0:0.00:0
chr1    10347   gridss0fb_6o    A       A[chr1:10359[   122.80  LOW_QUAL;NO_ASSEMBLY    AS=0;ASC=1X11N1X;ASQ=0.00;ASRP=0;ASSR=0;BA=0;BANRP=3;BANRPQ=43.78;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BMQ=38.07;BMQN=20.00;BMQX=60.00;BQ=1857.63;BSC=1;BSCQ=22.49;BUM=54;BUMQ=1835.14;BVF=55;CAS=0;CASQ=0.00;CIPOS=-6,6;CIRPOS=-6,6;CQ=122.80;EVENT=gridss0fb_6;HOMLEN=12;HOMSEQ=ACCCTAACCCTA;IC=2;IHOMPOS=-6,6;IQ=64.60;MATEID=gridss0fb_6h;MQ=32.17;MQN=22.00;MQX=40.00;RAS=0;RASQ=0.00;REF=327;REFPAIR=1314;RP=4;RPQ=58.20;SB=0.33333334;SC=104M125D41M17D44M1X11N1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=6        GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:0.018:0.00:0:0:3:43.78:0:0.00:0.00:0:0:1857.63:1:22.49:54:1835.14:55:0.00:2:64.60:122.80:0.00:327:1314:4:58.20:0:0.00:6
chr1    10358   gridss0b_14b    A       .AACCCTAACCA    235.76  ASSEMBLY_BIAS;LOW_QUAL;NO_RP;NO_SR      AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=235.76;BASRP=7;BASSR=1;BEID=asm0-2;BEIDH=-1;BEIDL=10;BMQ=40.00;BMQN=40.00;BMQX=40.00;BQ=235.76;BSC=0;BSCQ=0.00;BUM=0;BUMQ=0.00;BVF=8;CAS=0;CASQ=0.00;CQ=415.90;EVENT=gridss0b_14;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=695;REFPAIR=1120;RP=0;RPQ=0.00;SB=1.0;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0       GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF      .:4.388e-03:0.00:0:0:0:0.00:0:0.00:235.76:7:1:235.76:0:0.00:0:0.00:8:0.00:0:0.00:0.00:0.00:695:1120:0:0.00:0:0.00:0
chr1    10359   gridss0fb_6h    A       ]chr1:10347]A   122.80  LOW_QUAL;NO_ASSEMBLY    AS=0;ASC=1X5N1X;ASQ=0.00;ASRP=0;ASSR=0;BA=0;BANRP=3;BANRPQ=43.78;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BMQ=32.36;BMQN=21.00;BMQX=40.00;BQ=407.21;BSC=4;BSCQ=100.21;BUM=10;BUMQ=307.00;BVF=14;CAS=0;CASQ=0.00;CIPOS=-6,6;CIRPOS=-6,6;CQ=122.80;EVENT=gridss0fb_6;HOMLEN=12;HOMSEQ=ACCCTAACCCTA;IC=2;IHOMPOS=-6,6;IQ=64.60;MATEID=gridss0fb_6o;MQ=32.17;MQN=22.00;MQX=40.00;RAS=0;RASQ=0.00;REF=450;REFPAIR=981;RP=4;RPQ=58.20;SB=0.16666667;SC=1X5N1X102M;SR=0;SRQ=0.00;SVTYPE=BND;VF=6 GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:0.013:0.00:0:0:3:43.78:0:0.00:0.00:0:0:407.21:4:100.21:10:307.00:14:0.00:2:64.60:122.80:0.00:450:981:4:58.20:0:0.00:6
chr1    10385   gridss0fb_8o    C       CCTAACCCT[chr1:10394[   54.52   LOW_QUAL;SINGLE_ASSEMBLY        AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=2;BA=0;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BEID=asm0-27781;BEIDH=0;BEIDL=0;BQ=0.00;BSC=0;BSCQ=0.00;BUM=0;BUMQ=0.00;BVF=0;CAS=0;CASQ=0.00;CQ=54.52;EVENT=gridss0fb_8;IC=0;IHOMPOS=0,0;IQ=0.00;MATEID=gridss0fb_8h;MQ=39.00;MQN=39.00;MQX=39.00;RAS=1;RASQ=54.52;REF=856;REFPAIR=840;RP=0;RPQ=0.00;SB=0.0;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=2     GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:2.331e-03:0.00:0:2:0:0.00:0:0.00:0.00:0:0:0.00:0:0.00:0:0.00:0:0.00:0:0.00:54.52:54.52:856:840:0:0.00:0:0.00:2
chr1    10394   gridss0fb_8h    T       ]chr1:10385]CTAACCCTT   54.52   LOW_QUAL;SINGLE_ASSEMBLY        AS=1;ASC=1X;ASQ=54.52;ASRP=0;ASSR=2;BA=0;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BEID=asm0-27781;BEIDH=0;BEIDL=0;BMQ=30.00;BMQN=30.00;BMQX=30.00;BQ=22.26;BSC=1;BSCQ=22.26;BUM=0;BUMQ=0.00;BVF=1;CAS=0;CASQ=0.00;CQ=54.52;EVENT=gridss0fb_8;IC=0;IHOMPOS=0,0;IQ=0.00;MATEID=gridss0fb_8o;MQ=39.00;MQN=39.00;MQX=39.00;RAS=0;RASQ=0.00;REF=731;REFPAIR=725;RP=0;RPQ=0.00;SB=0.0;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=2   GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:2.729e-03:54.52:0:2:0:0.00:0:0.00:0.00:0:0:22.26:1:22.26:0:0.00:1:0.00:0:0.00:54.52:0.00:731:725:0:0.00:0:0.00:2
chr1    10548   gridss0ff_3o    C       C]chr15:101980741]      104.01  LOW_QUAL;SINGLE_ASSEMBLY        AS=1;ASC=1X174N1X;ASQ=59.54;ASRP=4;ASSR=0;BA=0;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BEID=asm0-24;BEIDH=2;BEIDL=0;BMQ=30.21;BMQN=21.00;BMQX=49.00;BQ=354.49;BSC=7;BSCQ=169.49;BUM=7;BUMQ=185.00;BVF=14;CAS=0;CASQ=0.00;CIPOS=-87,88;CIRPOS=-87,88;CQ=104.01;EVENT=gridss0ff_3;IC=0;IMPRECISE;IQ=0.00;MATEID=gridss0ff_3h;MQ=26.25;MQN=20.00;MQX=38.00;RAS=0;RASQ=0.00;REF=1;REFPAIR=0;RP=3;RPQ=44.47;SB=0.0;SC=115M1X174N1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=4  GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:0.800:59.54:4:0:0:0.00:0:0.00:0.00:0:0:354.49:7:169.49:7:185.00:14:0.00:0:0.00:104.01:0.00:1:0:3:44.47:0:0.00:4
-bash-4.2$

code used to generate the file:

rm -f /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.*;rm -r /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/;mkdir -p /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/;mkdir -p /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/;module load gridss/2.12.2 samtools R java/17.0.2; mkdir -p /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/; gridss -r /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa -w /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/ -a /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/PALZGU_T_N_paired_assembly_hg38.bam -o /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz -t 32 /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_N.bam /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_T.bam;
[+] Loading gridss  2.12.2  on cn3160
[+] Loading singularity  3.8.5-1  on cn3160
[-] Unloading samtools 1.15  ...
[+] Loading samtools 1.15  ...
[+] Loading gcc  9.2.0  ...
[+] Loading GSL 2.6 for GCC 9.2.0 ...
[-] Unloading gcc  9.2.0  ...
[+] Loading gcc  9.2.0  ...
[+] Loading openmpi 4.0.5  for GCC 9.2.0
[+] Loading ImageMagick  7.0.8  on cn3160
[+] Loading HDF5  1.10.4
[-] Unloading gcc  9.2.0  ...
[+] Loading gcc  9.2.0  ...
[+] Loading NetCDF 4.7.4_gcc9.2.0
[+] Loading pandoc  2.17.1.1  on cn3160
[+] Loading pcre2 10.21  ...
[+] Loading R 4.1.3
[+] Loading java 17.0.2  ...
Using working directory "/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/"
Fri May  6 06:33:51 EDT 2022: Full log file is: /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/gridss.full.20220506_063351.cn3160.2696.log
Fri May  6 06:33:51 EDT 2022: Found /usr/bin/time
Fri May  6 06:33:51 EDT 2022: Using GRIDSS jar /opt/gridss/gridss-2.12.2-gridss-jar-with-dependencies.jar
Fri May  6 06:33:51 EDT 2022: Using reference genome "/data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa"
Fri May  6 06:33:51 EDT 2022: Using output VCF /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz
Fri May  6 06:33:51 EDT 2022: Using assembly bam /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/PALZGU_T_N_paired_assembly_hg38.bam
Fri May  6 06:33:51 EDT 2022: WARNING: GRIDSS scales sub-linearly at high thread count. Up to 8 threads is the recommended level of parallelism.
Fri May  6 06:33:51 EDT 2022: Using 32 worker threads.
Fri May  6 06:33:51 EDT 2022: Using no blacklist bed. The encode DAC blacklist is recommended for hg19.
Fri May  6 06:33:51 EDT 2022: Using JVM maximum heap size of 30g for assembly and variant calling.
Fri May  6 06:33:51 EDT 2022: Using input file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_N.bam
Fri May  6 06:33:51 EDT 2022: Using input file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_T.bam
Fri May  6 06:33:51 EDT 2022: Found /usr/bin/Rscript
Fri May  6 06:33:51 EDT 2022: Found /usr/bin/samtools
Fri May  6 06:33:51 EDT 2022: Found /usr/bin/java
Fri May  6 06:33:51 EDT 2022: Found /usr/bin/bwa
Fri May  6 06:33:51 EDT 2022: samtools version: 1.10+htslib-1.10.2-3
Fri May  6 06:33:51 EDT 2022: R version: R scripting front-end version 4.1.0 (2021-05-18)
Fri May  6 06:33:51 EDT 2022: bwa Version: 0.7.17-r1188
Fri May  6 06:33:51 EDT 2022: time version: GNU time 1.7
Fri May  6 06:33:51 EDT 2022: bash version: GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Fri May  6 06:33:52 EDT 2022: java version: openjdk version "11.0.11" 2021-04-20        OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)     OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)
Fri May  6 06:33:52 EDT 2022: Max file handles: 131072
Fri May  6 06:33:52 EDT 2022: Running GRIDSS steps: setupreference, preprocess, assemble, call,
Fri May  6 06:33:52 EDT 2022: Start pre-processing      /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_N.bam
Fri May  6 06:33:52 EDT 2022: Running   CollectGridssMetrics    /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_N.bam   first 10000000 records
Fri May  6 06:34:48 EDT 2022: Running   CollectGridssMetricsAndExtractSVReads|samtools  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_N.bam
Fri May  6 06:55:17 EDT 2022: Running   PreprocessForBreakendAssembly|samtools  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_N.bam
Fri May  6 07:07:36 EDT 2022: Complete pre-processing   /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_N.bam
Fri May  6 07:07:36 EDT 2022: Start pre-processing      /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_T.bam
Fri May  6 07:07:36 EDT 2022: Running   CollectGridssMetrics    /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_T.bam   first 10000000 records
Fri May  6 07:08:43 EDT 2022: Running   CollectGridssMetricsAndExtractSVReads|samtools  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_T.bam
Fri May  6 07:28:58 EDT 2022: Running   PreprocessForBreakendAssembly|samtools  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_T.bam
Fri May  6 07:40:38 EDT 2022: Complete pre-processing   /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/bam_hg38/PALZGU_T.bam
Fri May  6 07:40:38 EDT 2022: Start assembly    /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/PALZGU_T_N_paired_assembly_hg38.bam
Fri May  6 07:40:38 EDT 2022: Running   AssembleBreakends       /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/PALZGU_T_N_paired_assembly_hg38.bam    job 0   total jobs 1
Fri May  6 08:17:15 EDT 2022: Running   CollectGridssMetrics    /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/PALZGU_T_N_paired_assembly_hg38.bam
Fri May  6 08:17:30 EDT 2022: Running   SoftClipsToSplitReads|samtools  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/PALZGU_T_N_paired_assembly_hg38.bam
Fri May  6 08:21:58 EDT 2022: Complete assembly /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired_hg38/PALZGU_T_N_paired_assembly_hg38.bam
Fri May  6 08:21:58 EDT 2022: Start calling     /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz
Fri May  6 08:21:58 EDT 2022: Running   IdentifyVariants        /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz
Fri May  6 08:26:58 EDT 2022: Running   AnnotateVariants        /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz
Fri May  6 09:03:24 EDT 2022: Running   AnnotateInsertedSequence        /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz
Fri May  6 09:05:28 EDT 2022: Complete calling  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz
Fri May  6 09:05:28 EDT 2022: Run complete with 80 warnings and 0 errors.
d-cameron commented 2 years ago

I've replicated this issue with GRIDSS 2.12.2 and 2.13.2

Is the output vcf actually malformed, or is GRIPSS just complaining that it is. If you decompress the .vcf.gz, do a) you get any EOF decompression errors when decompressing, and b) does GRIPSS still complain about a malformed input file if you feed it the uncompressed .vcf?

The GRIDSS log files give no indication that there was any issue on the GRIDSS side. Given that all the GRIDSS intermediate steps read the intermediate VCFs without issue, it's likely that the cause is either in the final GRIDSS annotation step, with GRIPSS, or somehow with the pipeline structure/execution environment. Running GRIDSS with --keepTempFiles can be helpful with this sort of root cause analysis as it allows to you inspect all the intermediate files that GRIDSS uses and verify at which point something has gone wrong.

notice the first line in the ALT field, there's a period there and GRIPSS then calls the file malformed

That's perfectly valid VCF (See section 5.4.9 of https://samtools.github.io/hts-specs/VCFv4.3.pdf) and GRIPSS is design to handle VCFs that include single breakend variants. Any suggestions @charlesshale?

d-cameron commented 2 years ago

The only thing unusual that I can see on the GRIDSS side of things is WORKER_THREADS=32 instead of the recommended --threads 8 --jvmheap 31g so there's potentially a hidden race condition that only shows up at high levels of parallelism but if the output .vcf.gz isn't truncated then that's not going to cause. The usual symptom when there's too many threads is progress stalling followed by OutOfMemory : GC overhead limit exceeded. If it's run to completion without error it's not going to be that.

jamesdalg commented 2 years ago

At this point, I'm no longer experiencing truncated VCFs. It could have been a file system issue at the time. Really hard to say. Maybe it was too many threads or not enough ram. Thanks for the insight that you gave earlier. Should I consider using a larger heap size (and increased allocated ram upon submission) as well as only 8 threads? This is what I experienced at the time("unexpected end of file"):

(base) [dalgleishjl@cn0904 snakemake-gridss]$ zcat /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALHRL_T_N_gridss_paired/gridss_PALHRL_T_N_paired_output_hg38.vcf.gz | tail
gzip: /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALHRL_T_N_gridss_paired/gridss_PALHRL_T_N_paired_output_hg38.vcf.gz: ### **unexpected end of file**
chr1    790807  gridss0b_152b   G       .GAATGGACTCAAATGGAATAGAATTGACTCGAGTGGAAAG       223.84  ASSEMBLY_BIAS;LOW_QUAL;NO_RP    AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BEID=asm0-16488;BEIDH=-1;BEIDL=103;BMQ=28.73;BMQN=20.00;BMQX=40.00;BQ=223.84;BSC=10;BSCQ=223.84;BUM=0;BUMQ=0.00;BVF=10;CAS=0;CASQ=0.00;CQ=286.84;EVENT=gridss0b_152;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=345;REFPAIR=224;RP=0;RPQ=0.00;SB=1.0;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0        GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:5.814e-03:0.00:0:0:0:0.00:0:0.00:0.00:0:0:40.79:2:40.79:0:0.00:2:0.00:0:0.00:0.00:0.00:213:129:0:0.00:0:0.00:0        .:0.034:0.00:0:0:0:0.00:0:0.00:0.00:0:0:183.05:8:183.05:0:0.00:8:0.00:0:0.00:0.00:0.00:132:95:0:0.00:0:0.00:0
chr1    790853  gridss0f_147b   G       GTTTGGAAAGGACAAAAATGGAATGGAATAGAATGGAATGGAATGGAATG.     500.27  LOW_QUAL        AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=159.08;BASRP=8;BASSR=0;BEALN=chr14_GL000225v1_random:193015|+|8S41M|0;BEID=asm0-283;BEIDH=-1;BEIDL=0;BMQ=29.43;BMQN=20.00;BMQX=51.00;BQ=500.27;BSC=11;BSCQ=291.19;BUM=2;BUMQ=50.00;BVF=11;CAS=0;CASQ=0.00;CQ=758.48;EVENT=gridss0f_147;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=182;REFPAIR=210;RP=0;RPQ=0.00;SB=0.90909094;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0   GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:8.584e-03:0.00:0:0:0:0.00:0:0.00:36.93:2:0:87.87:2:50.93:0:0.00:2:0.00:0:0.00:0.00:0.00:118:113:0:0.00:0:0.00:0      .:0.053:0.00:0:0:0:0.00:0:0.00:122.14:6:0:412.40:9:240.25:2:50.00:9:0.00:0:0.00:0.00:0.00:64:97:0:0.00:0:0.00:0
chr1    790955  gridss0fb_128o  A       A[chr1:790966[  1726.77 NO_ASSEMBLY     AS=0;ASC=1X44N1X;ASQ=0.00;ASRP=0;ASSR=0;BA=0;BANRP=9;BANRPQ=163.55;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BMQ=25.00;BMQN=22.00;BMQX=27.00;BQ=67.09;BSC=1;BSCQ=18.09;BUM=2;BUMQ=49.00;BVF=3;CAS=0;CASQ=0.00;CIPOS=-22,23;CIRPOS=-22,23;CQ=1726.77;EVENT=gridss0fb_128;HOMLEN=45;HOMSEQ=GAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATG;IC=46;IHOMPOS=-10,10;IQ=1508.63;MATEID=gridss0fb_128h;MQ=52.48;MQN=20.00;MQX=60.00;RAS=0;RASQ=0.00;REF=275;REFPAIR=164;RP=12;RPQ=218.15;SB=0.34042552;SC=68M59D95M37D42M1X44N1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=58    GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF .:0.135:0.00:0:0:2:36.93:0:0.00:0.00:0:0:67.09:1:18.09:2:49.00:3:0.00:23:734.44:771.38:0.00:160:96:2:36.93:0:0.00:25   .:0.223:0.00:0:0:7:126.62:0:0.00:0.00:0:0:0.00:0:0.00:0:0.00:0:0.00:23:774.18:955.40:0.00:115:68:10:181.21:0:0.00:33
chr1    790966  gridss0fb_128h  A       ]chr1:790955]A  1726.77 NO_ASSEMBLY     AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=0;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BMQ=40.00;BMQN=40.00;BMQX=40.00;BQ=39.74;BSC=0;BSCQ=0.00;BUM=1;BUMQ=39.74;BVF=1;CAS=0;CASQ=0.00;CIPOS=-22,23;CIRPOS=-22,23;CQ=1726.77;EVENT=gridss0fb_128;HOMLEN=45;HOMSEQ=GAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATG;IC=46;IHOMPOS=-10,10;IQ=1508.63;MATEID=gridss0fb_128o;MQ=52.48;MQN=20.00;MQX=60.00;RAS=0;RASQ=0.00;REF=179;REFPAIR=164;RP=12;RPQ=218.15;SB=0.3478261;SC=1X66M138D102M17D27M;SR=0;SRQ=0.00;SVTYPE=BND;VF=58 GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF        .:0.185:0.00:0:0:0:0.00:0:0.00:0.00:0:0:39.74:0:0.00:1:39.74:1:0.00:23:734.44:771.38:0.00:110:96:2:36.93:0:0.00:25     .:0.324:0.00:0:0:0:0.00:0:0.00:0.00:0:0:0.00:0:0.00:0:0.00:0:0.00:23:774.18:955.40:0.00:69:68:10:181.21:0:0.00:33
chr1    790985  gridss0f_152b   A       ACACAAATTGAATGGAATGAAATGGAAC.   437.29  LOW_QUAL;NO_RP  AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=170.02;BASRP=0;BASSR=8;BEID=asm0-212;BEIDH=-1;BEIDL=0;BMQ=52.43;BMQN=24.00;BMQX=60.00;BQ=437.29;BSC=13;BSCQ=267.27;BUM=0;BUMQ=0.00;BVF=13;CAS=0;CASQ=0.00;CQ=493.24;EVENT=gridss0f_152;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=262;REFPAIR=244;RP=0;RPQ=0.00;SB=0.2857143;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0    GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF      .:0.023:0.00:0:0:0:0.00:0:0.00:107.53:0:5:252.39:7:144.87:0:0.00:7:0.00:0:0.00:0.00:0.00:151:140:0:0.00:0:0.00:0 .:0.027:0.00:0:0:0:0.00:0:0.00:62.49:0:3:184.90:6:122.41:0:0.00:6:0.00:0:0.00:0.00:0.00:111:104:0:0.00:0:0.00:0
chr1    791240  gridss0fb_135o  G       GCACGT[chr1:791246[     143.57  LOW_QUAL;SINGLE_ASSEMBLY        AS=1;ASC=1X;ASQ=118.57;ASRP=0;ASSR=5;BA=0;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BEID=asm0-159;BEIDH=0;BEIDL=0;BMQ=35.33;BMQN=33.00;BMQX=40.00;BQ=76.38;BSC=3;BSCQ=76.38;BUM=0;BUMQ=0.00;BVF=1;CAS=0;CASQ=0.00;CQ=143.57;EVENT=gridss0fb_135;IC=1;IHOMPOS=0,0;IQ=25.00;MATEID=gridss0fb_135h;MQ=32.50;MQN=25.00;MQX=40.00;RAS=0;RASQ=0.00;REF=768;REFPAIR=229;RP=0;RPQ=0.00;SB=1.0;SC=35M5D9M1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=5       GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:4.566e-03:49.82:0:2:0:0.00:0:0.00:0.00:0:0:32.63:1:32.63:0:0.00:1:0.00:0:0.00:49.82:0.00:436:119:0:0.00:0:0.00:2     .:8.955e-03:68.75:0:3:0:0.00:0:0.00:0.00:0:0:43.75:2:43.75:0:0.00:0:0.00:1:25.00:93.75:0.00:332:110:0:0.00:0:0.00:3
chr1    791246  gridss0fb_135h  G       ]chr1:791240]CACGTG     143.57  LOW_QUAL;SINGLE_ASSEMBLY        AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=5;BA=0;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BEID=asm0-159;BEIDH=0;BEIDL=0;BQ=0.00;BSC=0;BSCQ=0.00;BUM=0;BUMQ=0.00;BVF=0;CAS=0;CASQ=0.00;CQ=143.57;EVENT=gridss0fb_135;IC=1;IHOMPOS=0,0;IQ=25.00;MATEID=gridss0fb_135o;MQ=32.50;MQN=25.00;MQX=40.00;RAS=1;RASQ=118.57;REF=650;REFPAIR=233;RP=0;RPQ=0.00;SB=1.0;SC=1X23M;SR=0;SRQ=0.00;SVTYPE=BND;VF=5     GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:5.376e-03:0.00:0:2:0:0.00:0:0.00:0.00:0:0:0.00:0:0.00:0:0.00:0:0.00:0:0.00:49.82:49.82:370:118:0:0.00:0:0.00:2        .:0.011:0.00:0:3:0:0.00:0:0.00:0.00:0:0:0.00:0:0.00:0:0.00:0:0.00:1:25.00:93.75:68.75:280:115:0:0.00:0:0.00:3
chr1    791320  gridss0b_171b   G       .ATGGAAAGGAATGGACCCGAATATCATGGAATAGAATGCAAAGG   668.99  ASSEMBLY_BIAS;LOW_QUAL  AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=0.00;BASRP=0;BASSR=0;BEID=asm0-16480;BEIDH=-1;BEIDL=248;BMQ=36.54;BMQN=21.00;BMQX=60.00;BQ=668.99;BSC=11;BSCQ=291.99;BUM=12;BUMQ=377.00;BVF=23;CAS=0;CASQ=0.00;CQ=3324.09;EVENT=gridss0b_171;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=111;REFPAIR=431;RP=0;RPQ=0.00;SB=1.0;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0    GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:0.043:0.00:0:0:0:0.00:0:0.00:0.00:0:0:341.24:10:262.24:3:79.00:13:0.00:0:0.00:0.00:0.00:68:220:0:0.00:0:0.00:0        .:0.038:0.00:0:0:0:0.00:0:0.00:0.00:0:0:327.76:1:29.76:9:298.00:10:0.00:0:0.00:0.00:0.00:43:211:0:0.00:0:0.00:0
chr1    791361  gridss0f_168b   A       ATTTCAATGGACTTGAAAACAATGGAATGGAAGACAATGGAATG.   585.92  LOW_QUAL        AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=188.62;BASRP=7;BASSR=0;BEID=asm0-279;BEIDH=-1;BEIDL=0;BMQ=38.80;BMQN=26.00;BMQX=45.00;BQ=585.92;BSC=10;BSCQ=264.20;BUM=4;BUMQ=133.09;BVF=17;CAS=0;CASQ=0.00;CQ=1263.92;EVENT=gridss0f_168;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=245;REFPAIR=344;RP=0;RPQ=0.00;SB=1.0;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0       GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:0.042:0.00:0:0:0:0.00:0:0.00:121.53:5:0:426.23:9:238.69:2:66.00:14:0.00:0:0.00:0.00:0.00:137:181:0:0.00:0:0.00:0      .:0.011:0.00:0:0:0:0.00:0:0.00:67.09:2:0:159.69:1:25.51:2:67.09:3:0.00:0:0.00:0.00:0.00:108:163:0:0.00:0:0.00:0
chr1    791543  gridss0b_173b   T       .GGACACAAATGGAATGGAAT   1357.74 LOW_QUAL        AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=5;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=845.55;BASRP=14;BASSR=25;BEID=asm0-16483,asm0-16487,asm0-16517,asm0-16524,asm0-167;BEIDH=-1,-1,-1,-1,-1;BEIDL=120,159,19,18,9;BMQ=36.41;BMQN=22.00;BMQX=60.00;BQ=1357.74;BSC=17;BSCQ=361.19;BUM=5;BUMQ=151.00;BVF=44;CAS=0;CASQ=0.00;CQ=3699.29;EVENT=gridss0b_173;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=386;REFPAIR=325;RP=0;RPQ=0.00;SB=0.1904762;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0        GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF       .:0.078:0.00:0:0:0:0.00:0:0.00:614.89:13:16:1004.52:11:238.62:5:151.00:34:0.00:0:0.00:0.00:0.00:232:170:0:0.00:0:0.00:0 .:0.031:0.00:0:0:0:0.00:0:0.00:230.65:1:9:353.22:6:122.57:0:0.00:1.00:1.00:1.00:124;RP=0;RPQ=0.00;11111111116;1117P=7;BASSR=0;BEID=as1=32.5Q=845.55;1111111111111111Q:211:0:0111111111116;1117Q=585.92;BSC=10;BSC100;SBasm0-167;1111111111111111QTGGAAGAC111111111116;1117.92;EVENT=gridss0f_1ANRPQX=60.00;B1111111111111111QANRP=0;B111111111116;1117;RPQ=0.00;SB=1.0;SC1RASQ:00;CQ=3691111111111111111Q279;BEID111111111116;1117RP:BANRPQ:BANSR:BAN10:0.0R=325;RP=1111111111111111Q64.20;BU111111111116;1117UAL:RASQ:REF:REFPAI111:0.SQ:ASRP:A11111111111111

What I am now dealing with is this malformed error:

The following GRIPSS output calls the file malformed and gives specifics:
15:24:07.951 [WARN ] SV PON not ordered: last(157419-157419) vs this(157394-157424)
15:24:07.951 [WARN ] SV PON not ordered: last(165331-165331) vs this(165306-165346)
15:24:07.951 [INFO ] loaded 3103381 germline SV PON records from file(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe)
15:24:08.857 [INFO ] loaded 1520513 germline SGL PON records from file(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed)
15:24:08.872 [INFO ] loaded 446 known hotspot records from file
15:24:09.031 [INFO ] sample(PALZGU_T) processing VCF(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.repeat.vcf.gz)
15:24:09.033 [INFO ] genetype info: ref(0: PALZGU_N) tumor(1: PALZGU_T)
Exception in thread "main" htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 2894: there are 1 genotypes while the header requires that 2 genotypes be present for all records at chr1:10000
        at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:887)
        at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:759)
        at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:121)
        at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
        at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148)
        at htsjdk.variant.variantcontext.GenotypesContext.get(GenotypesContext.java:417)
        at htsjdk.variant.variantcontext.VariantContext.getGenotype(VariantContext.java:1102)
        at com.hartwig.hmftools.gripss.filters.HardFilters.belowMinQual(HardFilters.java:46)
        at com.hartwig.hmftools.gripss.filters.HardFilters.isFiltered(HardFilters.java:35)
        at com.hartwig.hmftools.gripss.VariantBuilder.checkCreateVariant(VariantBuilder.java:59)
        at com.hartwig.hmftools.gripss.GripssApplication.processVariant(GripssApplication.java:307)
        at com.hartwig.hmftools.gripss.GripssApplication.lambda$processVcf$0(GripssApplication.java:141)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at com.hartwig.hmftools.gripss.GripssApplication.processVcf(GripssApplication.java:141)
        at com.hartwig.hmftools.gripss.GripssApplication.run(GripssApplication.java:108)
        at com.hartwig.hmftools.gripss.GripssApplication.main(GripssApplication.java:336)

(base) [dalgleishjl@cn3160 snakemake-gridss]$ gzcat /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.repeat.vcf.gz | head -n 2894 | tail -n 10
##contig=<ID=HPV-mKN1,length=7300>
##contig=<ID=HPV-mKN2,length=7299>
##contig=<ID=HPV-mKN3,length=7251>
##contig=<ID=HPV-mL55,length=7177>
##contig=<ID=HPV-mRTRX7,length=7731>
##contig=<ID=HPV-mSD2,length=7300>
##gridssVersion=2.12.2-gridss
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  PALZGU_N        PALZGU_T
chr1    10000   gridss0b_1b     N       .AACCCTAACCN    4500.73 NO_SR   AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=2259.04;BASRP=89;BASSR=0;BEID=asm0-27782;BEIDH=-1;BEIDL=10;BMQ=26.25;BMQN=20.00;BMQX=42.00;BQ=4500.73;BSC=0;BSCQ=0.00;BUM=86;BUMQ=2241.69;BVF=92;CAS=0;CASQ=0.00;CQ=4832.73;EVENT=gridss0b_1;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=23;REFPAIR=0;RP=0;RPQ=0.00;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0      GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF        .:0.800:0.00:0:0:0:0.00:0:0.00:2259.04:89:0:4500.73:0:0.00:86:2241.69:92:0.00:0:0.00:0.00:0.00:23:0:0:0.00:0:0.00:0
chr1    10151   gridss0f_3b     T       TTAACCCTAACCC.  422.48  ASSEMBLY_BIAS;LOW_QUAL;NO_RP    AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=390.48;BASRP=16;BASSR=0;BEID=asm0-27779;BEIDH=-1;BEIDL=0;BMQ=35.00;BMQN=32.00;BMQX=38.00;BQ=422.48;BSC=1;BSCQ=32.00;BUM=0;BUMQ=0.00;BVF=17;CAS=0;CASQ=0.00;CQ=2040.61;EVENT=gridss0f_3;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=396;REFPAIR=1513;RP=0;RPQ=0.00;SB=1.0;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0 GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF        .:8.827e-03:0.00:0:0:0:0.00:0:0.00:390.48:16:0:422.48:1:32.00:0:0.00:17:0.00:0:0.00:0.00:0.00:396:1513:0:0.00:0:0.00:0
(base) [dalgleishjl@cn3160 snakemake-gridss]$

This is the code that generated it:

        module load gridss samtools R java/17.0.2 bcftools repeatmasker;
        chmod -w /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz;
        echo 'PALZGU_N' > PALZGU_sample_names.txt;
        echo 'PALZGU_T' >> PALZGU_sample_names.txt;
        #REHEADER with correct samples
        bcftools reheader  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz -s PALZGU_sample_names.txt -o /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.reheadered.vcf.gz;
        cp /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.reheadered.vcf.gz /lscratch/$SLURM_JOB_ID/;
        mkdir -p /lscratch/$SLURM_JOBID/repeatmasker/;
        /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/scripts/gridss_annotate_vcf_repeatmasker         -w /lscratch/$SLURM_JOBID/repeatmasker         -t 32         -j /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_pipeline/gridss-2.13.2-gridss-jar-with-dependencies.jar         -o /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.repeat.vcf.gz         /lscratch/$SLURM_JOB_ID/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.reheadered.vcf.gz;
        module load gridss samtools R java/17.0.2 bcftools repeatmasker; java -jar  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/gripss/gripss_v2.1.jar  -sample PALZGU_T -reference PALZGU_N  -ref_genome /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa  -pon_sv_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe -pon_sgl_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed -known_hotspot_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/external_resources/HMFTools-Resources/Known-Fusions/38/known_fusions.38.bedpe   -vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.repeat.vcf.gz  -output_dir /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/;

Trying the uncompressed version gives the same error:

15:37:35.550 [WARN ] SV PON not ordered: last(165331-165331) vs this(165306-165346)
15:37:35.551 [INFO ] loaded 3103381 germline SV PON records from file(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe)
15:37:36.435 [INFO ] loaded 1520513 germline SGL PON records from file(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed)
15:37:36.437 [INFO ] loaded 446 known hotspot records from file
15:37:36.498 [INFO ] sample(PALZGU_T) processing VCF(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.repeat.vcf)
15:37:36.501 [INFO ] genetype info: ref(0: PALZGU_N) tumor(1: PALZGU_T)
Exception in thread "main" htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 2894: there are 1 genotypes while the header requires that 2 genotypes be present for all records at chr1:10000
        at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:887)
        at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:759)
        at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:121)
        at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
        at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148)
        at htsjdk.variant.variantcontext.GenotypesContext.get(GenotypesContext.java:417)
        at htsjdk.variant.variantcontext.VariantContext.getGenotype(VariantContext.java:1102)
        at com.hartwig.hmftools.gripss.filters.HardFilters.belowMinQual(HardFilters.java:46)
        at com.hartwig.hmftools.gripss.filters.HardFilters.isFiltered(HardFilters.java:35)
        at com.hartwig.hmftools.gripss.VariantBuilder.checkCreateVariant(VariantBuilder.java:59)
        at com.hartwig.hmftools.gripss.GripssApplication.processVariant(GripssApplication.java:307)
        at com.hartwig.hmftools.gripss.GripssApplication.lambda$processVcf$0(GripssApplication.java:141)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at com.hartwig.hmftools.gripss.GripssApplication.processVcf(GripssApplication.java:141)
        at com.hartwig.hmftools.gripss.GripssApplication.run(GripssApplication.java:108)
        at com.hartwig.hmftools.gripss.GripssApplication.main(GripssApplication.java:336)

Trying the version without the header added results in a header that says there are no sample names.

(base) [dalgleishjl@cn3160 snakemake-gridss]$         module load gridss samtools R java/17.0.2 bcftools repeatmasker; java -jar  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/gripss/gripss_v2.1.jar  -sample PALZGU_T -reference PALZGU_N  -ref_genome /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa  -pon_sv_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe -pon_sgl_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed -known_hotspot_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/external_resources/HMFTools-Resources/Known-Fusions/38/known_fusions.38.bedpe   -vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz  -output_dir /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/;
15:38:50.205 [INFO ] loaded 3103381 germline SV PON records from file(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe)
15:38:51.090 [INFO ] loaded 1520513 germline SGL PON records from file(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed)
15:38:51.092 [INFO ] loaded 446 known hotspot records from file
15:38:51.340 [INFO ] sample(PALZGU_T) processing VCF(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz)
**15:38:51.341 [ERROR] missing sample names in VCF: [sample]**

Trying the version that has not had the repeats annotated results in the same error:

(base) [dalgleishjl@cn3160 snakemake-gridss]$         module load gridss samtools R java/17.0.2 bcftools repeatmasker; java -jar  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/gripss/gripss_v2.1.jar  -sample PALZGU_T -reference PALZGU_N  -ref_genome /data/CCRBioinfo/dalgleishjl/sv_mapping/hg38_ref/hg38.fa  -pon_sv_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe -pon_sgl_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed -known_hotspot_file /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/external_resources/HMFTools-Resources/Known-Fusions/38/known_fusions.38.bedpe   -vcf /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.reheadered.vcf.gz  -output_dir /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/;

...
15:40:50.833 [WARN ] SV PON not ordered: last(157419-157419) vs this(157394-157424)
15:40:50.833 [WARN ] SV PON not ordered: last(165331-165331) vs this(165306-165346)
15:40:50.833 [INFO ] loaded 3103381 germline SV PON records from file(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_breakpoint.38.bedpe)
15:40:51.741 [INFO ] loaded 1520513 germline SGL PON records from file(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss/ponhg38/gridss_pon_single_breakend.38.bed)
15:40:51.743 [INFO ] loaded 446 known hotspot records from file
15:40:51.845 [INFO ] sample(PALZGU_T) processing VCF(/data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.reheadered.vcf.gz)
15:40:51.847 [INFO ] genetype info: ref(0: PALZGU_N) tumor(1: PALZGU_T)
Exception in thread "main" htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 2894: there are 1 genotypes while the header requires that 2 genotypes be present for all records at chr1:10000
        at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:887)
        at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:759)
        at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:121)
        at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
        at htsjdk.variant.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:148)
        at htsjdk.variant.variantcontext.GenotypesContext.get(GenotypesContext.java:417)
        at htsjdk.variant.variantcontext.VariantContext.getGenotype(VariantContext.java:1102)
        at com.hartwig.hmftools.gripss.filters.HardFilters.belowMinQual(HardFilters.java:46)
        at com.hartwig.hmftools.gripss.filters.HardFilters.isFiltered(HardFilters.java:35)
        at com.hartwig.hmftools.gripss.VariantBuilder.checkCreateVariant(VariantBuilder.java:59)
        at com.hartwig.hmftools.gripss.GripssApplication.processVariant(GripssApplication.java:307)
        at com.hartwig.hmftools.gripss.GripssApplication.lambda$processVcf$0(GripssApplication.java:141)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at com.hartwig.hmftools.gripss.GripssApplication.processVcf(GripssApplication.java:141)
        at com.hartwig.hmftools.gripss.GripssApplication.run(GripssApplication.java:108)
        at com.hartwig.hmftools.gripss.GripssApplication.main(GripssApplication.java:336)

I can try looking at temp files if you like. What specifically should I look at?

I also stumbled on this. Maybe downstream tools can handle it, but there appears to be a single line 2893 that is flagged for a possible number of fields being off. Maybe this is a good lead into finding what's wrong. I hope so!

(base) [dalgleishjl@cn3160 snakemake-gridss]$ vcf-validator  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.reheadered.vcf.gz
The header tag 'reference' not present. (Not required but highly recommended.)
Wrong number of fieldsin /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.reheadered.vcf.gz; expected 11, got 10. The offending line was:
[chr1   10000   gridss0b_1b     N       .AACCCTAACCN    4500.73 NO_SR   AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=2259.04;BASRP=89;BASSR=0;BEID=asm0-27782;BEIDH=-1;BEIDL=10;BMQ=26.25;BMQN=20.00;BMQX=42.00;BQ=4500.73;BSC=0;BSCQ=0.00;BUM=86;BUMQ=2241.69;BVF=92;CAS=0;CASQ=0.00;CQ=4832.73;EVENT=gridss0b_1;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=23;REFPAIR=0;RP=0;RPQ=0.00;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0      GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF        .:0.800:0.00:0:0:0:0.00:0:0.00:2259.04:89:0:4500.73:0:0.00:86:2241.69:92:0.00:0:0.00:0.00:0.00:23:0:0:0.00:0:0.00:0]

 at /usr/local/apps/vcftools/0.1.16/lib/perl5/site_perl/5.24.3/Vcf.pm line 172, <__ANONIO__> line 2893.
        Vcf::throw(Vcf4_2=HASH(0x813f88), "Wrong number of fieldsin /data/CCRBioinfo/dalgleishjl/sv_mapp"...) called at /usr/local/apps/vcftools/0.1.16/lib/perl5/site_perl/5.24.3/Vcf.pm line 507
        VcfReader::next_data_hash(Vcf4_2=HASH(0x813f88), ARRAY(0xaaec88)) called at /usr/local/apps/vcftools/0.1.16/lib/perl5/site_perl/5.24.3/Vcf.pm line 3479
        Vcf4_1::next_data_hash(Vcf4_2=HASH(0x813f88), ARRAY(0xaaec88)) called at /usr/local/apps/vcftools/0.1.16/lib/perl5/site_perl/5.24.3/Vcf.pm line 2586
        VcfReader::run_validation(Vcf4_2=HASH(0x813f88)) called at /usr/local/apps/vcftools/0.1.16/bin/vcf-validator line 60
        main::do_validation(HASH(0x7d3e18)) called at /usr/local/apps/vcftools/0.1.16/bin/vcf-validator line 14
(base) [dalgleishjl@cn3160 snakemake-gridss]$ vcf-validator  /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.repeat.vcf.gz
The header tag 'reference' not present. (Not required but highly recommended.)
Wrong number of fieldsin /data/CCRBioinfo/dalgleishjl/sv_mapping/gridss_purple_linx/PALZGU_T_N_gridss_paired/gridss_PALZGU_T_N_paired_output_hg38.vcf.gz.repeat.vcf.gz; expected 11, got 10. The offending line was:
[chr1   10000   gridss0b_1b     N       .AACCCTAACCN    4500.73 NO_SR   AS=0;ASC=1X;ASQ=0.00;ASRP=0;ASSR=0;BA=1;BANRP=0;BANRPQ=0.00;BANSR=0;BANSRQ=0.00;BAQ=2259.04;BASRP=89;BASSR=0;BEID=asm0-27782;BEIDH=-1;BEIDL=10;BMQ=26.25;BMQN=20.00;BMQX=42.00;BQ=4500.73;BSC=0;BSCQ=0.00;BUM=86;BUMQ=2241.69;BVF=92;CAS=0;CASQ=0.00;CQ=4832.73;EVENT=gridss0b_1;IC=0;IQ=0.00;RAS=0;RASQ=0.00;REF=23;REFPAIR=0;RP=0;RPQ=0.00;SC=1X;SR=0;SRQ=0.00;SVTYPE=BND;VF=0      GT:AF:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF        .:0.800:0.00:0:0:0:0.00:0:0.00:2259.04:89:0:4500.73:0:0.00:86:2241.69:92:0.00:0:0.00:0.00:0.00:23:0:0:0.00:0:0.00:0]

 at /usr/local/apps/vcftools/0.1.16/lib/perl5/site_perl/5.24.3/Vcf.pm line 172, <__ANONIO__> line 2893.
        Vcf::throw(Vcf4_2=HASH(0x813f88), "Wrong number of fieldsin /data/CCRBioinfo/dalgleishjl/sv_mapp"...) called at /usr/local/apps/vcftools/0.1.16/lib/perl5/site_perl/5.24.3/Vcf.pm line 507
        VcfReader::next_data_hash(Vcf4_2=HASH(0x813f88), ARRAY(0xaaef18)) called at /usr/local/apps/vcftools/0.1.16/lib/perl5/site_perl/5.24.3/Vcf.pm line 3479
        Vcf4_1::next_data_hash(Vcf4_2=HASH(0x813f88), ARRAY(0xaaef18)) called at /usr/local/apps/vcftools/0.1.16/lib/perl5/site_perl/5.24.3/Vcf.pm line 2586
        VcfReader::run_validation(Vcf4_2=HASH(0x813f88)) called at /usr/local/apps/vcftools/0.1.16/bin/vcf-validator line 60
        main::do_validation(HASH(0x7d3e18)) called at /usr/local/apps/vcftools/0.1.16/bin/vcf-validator line 14
(base) [dalgleishjl@cn3160 snakemake-gridss]$
toddajohnson commented 2 years ago

James, See the GRIPSS updated README and https://github.com/hartwigmedical/hmftools/issues/238 From GRIPSS 2.0, the PON needs to be sorted by ChromosomeStart and PositionStart. If you have processed GRIDSS VCFs using GRIPSS and an unordered PON, then even if it finishes, it probably is not annotating correctly.