AstraZeneca-NGS / VarDictJava

VarDict Java port
MIT License
127 stars 55 forks source link

Shortened line count with repeated start/end/ref/alt coordinates inserted #3

Closed chapmanb closed 9 years ago

chapmanb commented 9 years ago

Thanks for the work on the VarDict java port. The speed improvements are very welcome, and I've been working on validating the latest version (1.2.1) against the DREAM ICGC-TCGA challenge data used previously to validate VarDict Perl (http://bcb.io/2015/03/05/cancerval/).

I ran into some issues with truncated lines where the VarDictJava output will have 37 lines instead of 51. A small reproducible test case is here with the BAM files and a small shell script you'll have to adjust to point to a local copy of GRCh37:

https://s3.amazonaws.com/chapmanb/az/vardict_line_count_problem.tar.gz

with the output of vardict and vardict-java included. Specifically it differs at the call at 237754437, where VarDictJava appears to repeat the start/end/ref/alt in the middle of the line, but writes out the start and end values correctly:

['syn3-tumor', '1', '1', '237754437', '237754437', 'C', 'T', '0', '0', '0', '0',
'0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '4', '2',
'1', '1', '0', '2', 'C/T', '0.500', '2;0', '13.5', '1', '32.0', '1', '60.0',
'4.000', '0.667', '0', '3.5', '0', '4.000', '3', 'CCCCTTCTCCTCCTCCCCCT',
'CTCCTCCTCCCCCTCCTCCT', '1:237753695-237754577', 'Deletion', 'SNV\n']

['syn3-tumor', '1', '1', '237754437', '237754437', 'C', 'T', '0', '0', '0', '0',
'0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
'237754437', '237754437', 'C', 'T', '0', '4.000', '3', 'CCCCTTCTCCTCCTCCCCCT',
'CTCCTCCTCCCCCTCCTCCT', '1:237753695-237754577', 'Deletion', 'SNV\n']

Let me know if any other information would be helpful to debug this.

@mjafin @zhongwulai While tracking it down I ran into this potential issues with the R code:

https://github.com/AstraZeneca-NGS/VarDict/blob/master/testsomatic.R#L5

If anything goes wrong it'll swallow up all the output, silently producing an empty VCF file. Do we every expect it to be okay that R data frame import will fail? If not, it would be good to raise an error here and remove the try/catch. teststrandbias.R has the same issue. What do you think?

Thanks for looking at this.

mjafin commented 9 years ago

Big thanks for debugging this Brad! What do you reckon would be the most informative way of failing in the R code? And exit with non-zero exit code?

chapmanb commented 9 years ago

Miika; My crude thought was to remove the tryCatch and just let it fail and spit out an error message. The error was pretty useful in debugging here so hopefully will at least give us a clue of future problems so we can fix them upstream. If there are cases where we expect it is okay to fail we can enumerate and deal with those. What do you think?

mjafin commented 9 years ago

Brad, I believe this and issue #4 have been addressed in the latest release (candidate). If the latest additions are to your liking let's close these tickets

mjafin commented 9 years ago

~100% concordance in latest revision: http://imgur.com/njwE9tG