bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

Support Freebayes tumor/normal somatic calling #204

Closed lbeltrame closed 10 years ago

lbeltrame commented 10 years ago

I noticed that Freebayes now supports somatic/tumor calling. I think, given it's a fairly mature tool, that support should be implemented in the pipeline

This is of course related to issue #112.

Given that I work a lot on tumor samples, I'm willing to do the job. Where can I look for how freebayes is called?

chapmanb commented 10 years ago

Luca; Good timing, I literally added this to the todo list this morning along with a link to a discussion about recommended parameters for calling pairs:

https://github.com/chapmanb/bcbio-nextgen/blob/master/TODO.md

It looks worth exploring despite the current rough edges. From my side I'll work to add vcflib into the third-party tool installs so we have the required command line vcf manipulation tools available.

The current freebayes code is here:

https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/freebayes.py

Thanks for looking at this.

mjafin commented 10 years ago

+1 for this - happy to help in testing at least!

chapmanb commented 10 years ago

I added vcflib as a homebrew package, so the tools there are now available for using within bcbio-nextgen. For tumor/normal calling, this provides vcfsamplediff. This also allows use of vcfintersect and vcfcreatemulti for VCF file merging (#155).

lbeltrame commented 10 years ago

I'm running standalone freebayes for some initial testing and I'm encountering the same problem as reported in this thread (http://www.biostars.org/p/85835/).

I'll investigate further.

lbeltrame commented 10 years ago

I successfully made a standalone freebayes run with the recommended settings. Brad, do you want the support implemented like in VarScan, where it is decided "on the fly" whether to go for the paired tumor/normal or "regular" routes?

chapmanb commented 10 years ago

Luca; Great news, thanks. I agree, the VarScan approach of going to paired tumor/normal if you have two samples and a cancer phenotype listed sounds perfect. It would be worth abstracting the logic for that into a single function shared across the two implementations so they're consistent (something like is_paired_analysis). Thanks again.

lbeltrame commented 10 years ago

Something like the check done in the VarScan call?

 if len(align_bams) == 2 and all(item["metadata"].get("phenotype")
                                 is not None for item in items):

For getting the right paired/normal combination I made get_paired_bams which is already used (but MuTect needs to be ported to that, IIRC).

chapmanb commented 10 years ago

Exactly. It would be great to define all this logic in only one place, so you could do:

def is_paired_analysis(align_bams, items):
    try:
        get_paired_bams(align_bams, items)
        return True
    except ValueError:
        return False

We should also make get_paired_bams return a better flag or raise a more specific error than a ValueError to make this cleaner. My suggestion would be to have get_paired_bams return a namedtuple instead of the bare tuple and then return None if it's not a cancer sample.

lbeltrame commented 10 years ago

Good suggestions, I'll see about adding that as well.

lbeltrame commented 10 years ago

I'm starting to lay out the groundwork: one question I have is if you prefer piping over creating intermediates (both can be done).

Another question is: how do you handle programs that output to stdout? Where I can look for examples?

chapmanb commented 10 years ago

Piping would be preferable where possible. FreeBayes tools normally have very good support for this. Generally the approach I take now is to write as much as possible within a block as you'd normally do on the command line:

https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/ngsalign/bwa.py#L28

so for writing the final stdout to a file you'd just use a redirect:

https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/structural/lumpy.py#L129

Thanks again.

lbeltrame commented 10 years ago

This has been done, so I'm closing this report.