dnanexus-archive / parliament2

Runs a combination of tools to generate structural variant calls on whole-genome sequencing data
Apache License 2.0
102 stars 39 forks source link

Add GRIDSS support #68

Open d-cameron opened 5 years ago

d-cameron commented 5 years ago

I'd like to add GRIDSS support and am happy to provide a PR. Actually getting GRIDSS to run looks relatively straight-forward, but there are some downstream steps that I'm not sure how to handle.

It looks like I'd need to make the following changes:

Outstanding issues:

I'm pretty sure I've done basic misinterpretation of how parliament works. Any pointers would be much appreciated.

AndrewCarroll commented 5 years ago

Hi Daniel,

I have been curious about adding GRIDSS to Parliament2 as well, but am too constrained to easily make it happen. Here are some thoughts:

1) You list of changes looks roughly correct.

2) You will need to add a BWA index for assembly realignment. Given the time that generating a BWA index requires to generate, I would probably do either: a) Initiate the index at the start of the job and then check back later to initiate GRIDSS in the per-chromosome steps when the index is complete. This approach would be good if GRIDSS efficiently multi-threads or if it otherwise can't be correctly run in a per-chromosome manner. b) Include pre-built indices for a few common 37/38 flavors and only re-index if required. c) Just take the performance hit. Note though that we have heavily optimized Parliament2 for speed as one of the major goals is scalability, so if BWA index takes a while it will hurt.

3) For metrics, I would tar them into a GRIDSS-metrics.tar.gz file and add that as an output to Parliament2. This gives access to the files, but doesn't further inflate the file number.

4) The SV event uses SURVIVOR to merge. The commands for this should be in the script itself. If you write a valid VCF, SURVIVOR should generally work with it. Let me know if you need more clarity. I would reduce to high confidence events for this.

5) I am not 100% sure on the best way to incorporate single breakends. I am going to think about this some more.

6) One important thing not present here is that Parliament2 uses caller overlap to assign confidence based on GIAB truth sets. This is very powerful for filtering events based on which callers make them. We'd need to make a new calibration table with GRIDSS. I think only I have done that so far, but it would be a good thing to democratize that process. This could be a good opportunity for that.

Thanks!