genome-rcast / karkinos

Tumor genotyper, that detects SNV, absolute CNV and Tumor contents
Apache License 2.0
10 stars 2 forks source link

Add TerminalMismatch and SoftClipExtention filters #27

Closed federkasten closed 3 years ago

federkasten commented 3 years ago

This PR adds two filters as follows.

1. Soft-clipping expansion

If there are mismatches more than X in the inner Y bases of soft-clipped ends, it extends the soft-clip to the innermost mismatch position. It doesn't update alignments in the bam file.

2. Double-ended mismatch

If there are more than X mismatch bases in each of the Y bases at both ends, it marks reads as NG (terminal mismatch read). The NG reads will no longer be used for variant calls.

You can change the behavior of the above filter by Karkinos's property file.

e.g.

extraReadTerminalCheckLen=20
extraReadTerminalMismatchThres=2
codecov-io commented 3 years ago

Codecov Report

Merging #27 (873dbd2) into master (df82c9d) will decrease coverage by 0.09%. The diff coverage is 3.84%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #27      +/-   ##
============================================
- Coverage     11.28%   11.18%   -0.10%     
  Complexity      433      433              
============================================
  Files           132      134       +2     
  Lines         12597    12753     +156     
  Branches       2219     2257      +38     
============================================
+ Hits           1421     1427       +6     
- Misses        11108    11258     +150     
  Partials         68       68              
Impacted Files Coverage Δ Complexity Δ
.../ac/utokyo/rcast/karkinos/exec/TumorGenotyper.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
.../ac/utokyo/rcast/karkinos/filter/FilterResult.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...tokyo/rcast/karkinos/filter/SoftClipExtention.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...tokyo/rcast/karkinos/filter/SupportReadsCheck.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...utokyo/rcast/karkinos/filter/TerminalMismatch.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...jp/ac/utokyo/rcast/karkinos/exec/KarkinosProp.java 56.93% <66.66%> (+0.68%) 1.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update df82c9d...873dbd2. Read the comment docs.

federkasten commented 3 years ago

@xckitahara @alumi Can you please proceed to review and merge this PR? :pray:

I completed the work in this PR. I want to continue additional improvements in #28 about INDEL filtering.

federkasten commented 3 years ago

Thanks :bow: @alumi and @xckitahara reviewed and LGTM it, so I'll merge this PR on my end.