Open gseryogin opened 1 year ago
Does that mean that gridss can produce different results on a sample rerun and has stochastic properties?
It appears that there are different IDs in the output. Look at this:
BEID=asm170-4188,asm172-5972 (first run)
vs
BEID=asm170-4188,asm172-6016 (second run)
seems to suggest that the same samples were not run
I have reproduced this behaviour, running the same files 4 times gives different qualities for same variants. Also variant coordinates can change by 1-2bp.
I have reproduced this behaviour, running the same files 4 times gives different qualities for same variants. Also variant coordinates can change by 1-2bp.
thanks for testing, that sounds very worrying indeed!
It looks like there's a difference of 6 split read that were assembled in the first but not second run. This seems to indicate that #450 did not fully fix the issue of multi-threaded determinism. Can you confirm: a) what version(s) of GRIDSS were run b) what version of dependent tools (e..g bwa) were run (the start GRIDSS log file will contain this) c) whether the input.bam.gridss.working/input.bam.sv.bam files are identical between the two runs d) whether there were any command-line differences (e.g. changing the order of the input file can change the output because the reads will be processed in a different order)
It appears that there are different IDs in the output. Look at this: BEID=asm170-4188,asm172-5972 (first run) vs BEID=asm170-4188,asm172-6016 (second run) seems to suggest that the same samples were not run
This indicates that the assembly sub-process (asm172) processing the RHS had output a different number of assemblies by the time it got to chr10:43111683. That is, the root cause could be any difference in the 10Mb leading up the chr10:43111683. That said, if the inputs are identical then the output should be identical which clearly is not the case here.
Note that 'identical' inputs from the assembly sub-process is not necessarily the same as identical inputs to gridss overall. The GRIDSS pre-processing does a (bwa) realignment of soft-clipped reads (required to support for bowtie2 inputs, optional for bwa-aligned inputs). The input split reads/discordant read pairs to the asm/variant calling steps are the reads in the input.bam.gridss.working/input.bam.sv.bam files, not the input files given on the command line. Any non-determinism here will propagate downstream to asm & calling.
I have the same problem. Here are the results of two runs on the same files:
chr10 32016769 gridss170bb_38o C [chr10:43111683[C 512.74 PASS ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=2X87M;ASQ=162.56;ASRP=1;ASSR=14;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6279;BEIDH=0,88;BEIDL=80,0;BQ=0;BSC=0;BSCQ=0;BUM=0;BUMQ=0;BVF=0;CAS=0;CASQ=0;CIPOS=0,1;CIRPOS=-1,0;CQ=512.74;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=C;IC=0;IHOMPOS=0,1;IQ=0;MATEID=gridss170bb_38h;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=187.62;REF=22;REFPAIR=0;RP=1;RPQ=14.65;SB=0.45;SC=2X100M;SR=6;SRQ=147.91;SVTYPE=BND;VF=8;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913 GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF .:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:5:0:0:0:0:0:0 .:0.455:0:0:0:0:162.56:1:10:0:0:0:0:0:0:0:0:0:0:0:0:423.85:98.73:6:0:1:14.65:6:147.91:5 .:0.214:0:0:0:0:0:0:4:0:0:0:0:0:0:0:0:0:0:0:0:88.89:88.89:11:0:0:0:0:0:3
chr10 43111683 gridss170bb_38h C [chr10:32016769[C 512.74 PASS ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=1X80M;ASQ=187.62;ASRP=1;ASSR=14;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6279;BEIDH=80,0;BEIDL=0,88;BMQ=60;BMQN=60;BMQX=60;BQ=527.28;BSC=15;BSCQ=347.28;BUM=3;BUMQ=180;BVF=9;CAS=0;CASQ=0;CIPOS=-1,0;CIRPOS=0,1;CQ=512.74;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=G;IC=0;IHOMPOS=-1,0;IQ=0;MATEID=gridss170bb_38o;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=162.56;REF=20017;REFPAIR=0;RP=1;RPQ=14.65;SB=0.6;SC=1X80M;SR=6;SRQ=147.91;SVTYPE=BND;VF=8;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913 GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF .:0:0:0:0:0:0:0:0:0:0:0:60:0:0:1:60:1:0:0:0:0:0:7082:0:0:0:0:0:0 .:0.0007898:0:0:0:0:98.73:1:10:0:0:0:204.38:6:144.38:1:60:2:0:0:0:423.85:162.56:6326:0:1:14.65:6:147.91:5 .:0.0004537:0:0:0:0:88.89:0:4:0:0:0:262.89:9:202.89:1:60:6:0:0:0:88.89:0:6609:0:0:0:0:0:3
chr10 32016769 gridss170bb_38o C [chr10:43111683[C 494.4 LOW_QUAL ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=2X100M;ASQ=162.56;ASRP=1;ASSR=13;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6059;BEIDH=0,102;BEIDL=80,0;BQ=0;BSC=0;BSCQ=0;BUM=0;BUMQ=0;BVF=0;CAS=0;CASQ=0;CIPOS=0,1;CIRPOS=-1,0;CQ=494.4;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=C;IC=0;IHOMPOS=0,1;IQ=0;MATEID=gridss170bb_38h;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=169.28;REF=22;REFPAIR=0;RP=1;RPQ=14.65;SB=0.421053;SC=2X100M;SR=6;SRQ=147.91;SVTYPE=BND;VF=5;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913 GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF .:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:5:0:0:0:0:0:0 .:0.455:0:0:0:0:162.56:1:13:0:0:0:0:0:0:0:0:0:0:0:0:494.4:169.28:6:0:1:14.65:6:147.91:5 .:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:11:0:0:0:0:0:0
chr10 43111683 gridss170bb_38h C [chr10:32016769[C 494.4 LOW_QUAL ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=1X80M;ASQ=169.28;ASRP=1;ASSR=13;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6059;BEIDH=80,0;BEIDL=0,102;BMQ=60;BMQN=60;BMQX=60;BQ=527.28;BSC=15;BSCQ=347.28;BUM=3;BUMQ=180;BVF=12;CAS=0;CASQ=0;CIPOS=-1,0;CIRPOS=0,1;CQ=494.4;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=G;IC=0;IHOMPOS=-1,0;IQ=0;MATEID=gridss170bb_38o;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=162.56;REF=20017;REFPAIR=0;RP=1;RPQ=14.65;SB=0.588235;SC=1X80M;SR=6;SRQ=147.91;SVTYPE=BND;VF=5;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913 GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF .:0:0:0:0:0:0:0:0:0:0:0:60:0:0:1:60:1:0:0:0:0:0:7082:0:0:0:0:0:0 .:0.0007898:0:0:0:0:169.28:1:13:0:0:0:204.38:6:144.38:1:60:2:0:0:0:494.4:162.56:6326:0:1:14.65:6:147.91:5 .:0:0:0:0:0:0:0:0:0:0:0:262.89:9:202.89:1:60:9:0:0:0:0:0:6609:0:0:0:0:0:0
Assemblies that differ between runs: asm_1.txt asm_2.txt All input.bam.sv.bam files are identical, the only differences are in sv-assembly files. Both runs have been launched with the same commands. Versions of software:
Hi! Having one normal and two tumor bam files I got different calling results for a clinically important gene fusion KIF5B-RET.
These are records of two breakpoints, called first time from the specified set of bam files:
And these are records of the same two breakpoints, called second time from the same set of bam files:
QUAL, several INFO and FORMAT fields got different values (more noticable for the second tumor sample), after reruning gridss with exactly the same set of input bams and parameters. Especially concerning is the difference in read counts supporting varaint allele (VF), which are different and became considerably lower after second rerun (was 11 became 8).
Could you, please, let me know, is there any chance to avoide that sort of stochasticity, or, if this behaviour is an inevitable, give a piece of advice on how to cope with that? This is an inportant isssue, since it may affect somatic filters and and increase the number of false negative variants.