Closed lbeltrame closed 10 years ago
Luca; Good timing, I literally added this to the todo list this morning along with a link to a discussion about recommended parameters for calling pairs:
https://github.com/chapmanb/bcbio-nextgen/blob/master/TODO.md
It looks worth exploring despite the current rough edges. From my side I'll work to add vcflib into the third-party tool installs so we have the required command line vcf manipulation tools available.
The current freebayes code is here:
https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/freebayes.py
Thanks for looking at this.
+1 for this - happy to help in testing at least!
I added vcflib as a homebrew package, so the tools there are now available for using within bcbio-nextgen. For tumor/normal calling, this provides vcfsamplediff
. This also allows use of vcfintersect
and vcfcreatemulti
for VCF file merging (#155).
I'm running standalone freebayes for some initial testing and I'm encountering the same problem as reported in this thread (http://www.biostars.org/p/85835/).
I'll investigate further.
I successfully made a standalone freebayes run with the recommended settings. Brad, do you want the support implemented like in VarScan, where it is decided "on the fly" whether to go for the paired tumor/normal or "regular" routes?
Luca;
Great news, thanks. I agree, the VarScan approach of going to paired tumor/normal if you have two samples and a cancer phenotype listed sounds perfect. It would be worth abstracting the logic for that into a single function shared across the two implementations so they're consistent (something like is_paired_analysis
). Thanks again.
Something like the check done in the VarScan call?
if len(align_bams) == 2 and all(item["metadata"].get("phenotype")
is not None for item in items):
For getting the right paired/normal combination I made get_paired_bams
which is already used (but MuTect needs to be ported to that, IIRC).
Exactly. It would be great to define all this logic in only one place, so you could do:
def is_paired_analysis(align_bams, items):
try:
get_paired_bams(align_bams, items)
return True
except ValueError:
return False
We should also make get_paired_bams
return a better flag or raise a more
specific error than a ValueError to make this cleaner. My suggestion would be to
have get_paired_bams
return a namedtuple instead of the bare tuple and then
return None if it's not a cancer sample.
Good suggestions, I'll see about adding that as well.
I'm starting to lay out the groundwork: one question I have is if you prefer piping over creating intermediates (both can be done).
Another question is: how do you handle programs that output to stdout? Where I can look for examples?
Piping would be preferable where possible. FreeBayes tools normally have very good support for this. Generally the approach I take now is to write as much as possible within a block as you'd normally do on the command line:
https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/ngsalign/bwa.py#L28
so for writing the final stdout to a file you'd just use a redirect:
https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/structural/lumpy.py#L129
Thanks again.
This has been done, so I'm closing this report.
I noticed that Freebayes now supports somatic/tumor calling. I think, given it's a fairly mature tool, that support should be implemented in the pipeline
This is of course related to issue #112.
Given that I work a lot on tumor samples, I'm willing to do the job. Where can I look for how freebayes is called?