bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

ensemble calling filter variants not behaving as predicted #1826

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hey,

So I'm wanting to combine mutations from multiple callers and do ensemble calling, but I want to do so including all called variants, not only those which are flagged as 'PASS'. I used the following configuration, but in my ensemble callset everything is still flagged as 'PASS'.

`details:

There's no definitive list of arguments for the 'use_filtered' field in the documentation that I can find, so I assumed setting it to 'true' would allow me to use all called variants. Is this not correct?

Additionally, is there an option for if I want to do ensemble calling on all variants except germline calls?

Best wishes

Nick

chapmanb commented 7 years ago

Nick; Thanks for the questions. It looks like you have use_filtered set correctly, you can verify by looking at the bcbio-variation-recall ensemble command in log/bcbio-nextgen-commands.log. It shouldn't have the --nofiltered flag passed. The behavior of this is to treat all variants in the input files as PASS, so they'll appear as PASS in the ensemble file as well. I'm not sure this is a useful argument for somatic calling and is more intended for germline uses, since now you'll be mixing germline, somatic and noisy calls in your final ensemble output.

You'd be better off leaving off the use_filtered parameter and instead calling separately for somatic and germline values so you get an ensemble file for each:

http://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#somatic-with-germline-variants

Hope this provides the output you're looking for.

ghost commented 7 years ago

Hey,

Thanks a lot for your help, having run my analysis again it appears that the --nofiltered flag is still being passed. I have attached a screenshot of my yaml and of the command. I’m not trying to identify strictly high confidence variants, and it doesn’t matter particularly if they are germline or somatic which is why I’m opting to include all my variants which are called by multiple callers not just those which are flagged as PASS. I’m really hoping to present this work at a conference in a few weeks, so additionally I was wondering how I should site Bcbio-nextgen in my slides/abstract?

Best wishes,

Nick


Nicholas Younger PhD Student (Boulter/Sproul Labs) MRC Human Genetics Unit The Institute of Genetics and Molecular Medicine Western General Hospital Crewe Road South EH4 2XU

http://www.ed.ac.uk/profile/nicholas-younger

From: Brad Chapman notifications@github.com Reply-To: chapmanb/bcbio-nextgen reply@reply.github.com Date: Thursday, 23 February 2017 at 02:29 To: chapmanb/bcbio-nextgen bcbio-nextgen@noreply.github.com Cc: YOUNGER Nicholas s1115095@sms.ed.ac.uk, Author author@noreply.github.com Subject: Re: [chapmanb/bcbio-nextgen] ensemble calling filter variants not behaving as predicted (#1826)

Nick; Thanks for the questions. It looks like you have use_filtered set correctly, you can verify by looking at the bcbio-variation-recall ensemble command in log/bcbio-nextgen-commands.log. It shouldn't have the --nofiltered flag passed. The behavior of this is to treat all variants in the input files as PASS, so they'll appear as PASS in the ensemble file as well. I'm not sure this is a useful argument for somatic calling and is more intended for germline uses, since now you'll be mixing germline, somatic and noisy calls in your final ensemble output.

You'd be better off leaving off the use_filtered parameter and instead calling separately for somatic and germline values so you get an ensemble file for each:

http://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#somatic-with-germline-variants

Hope this provides the output you're looking for.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/chapmanb/bcbio-nextgen/issues/1826#issuecomment-281873237, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AYwFlRg4KYa5AkZb7Qul-s2I2TAsQFnRks5rfO8QgaJpZM4MIja6.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

chapmanb commented 7 years ago

Nick; I don't think you can attach files via GitHub e-mail so I don't see them here. If you could attach them via the GitHub web interface I can hopefully provide more help to figure out what is going wrong.

Thanks for citing bcbio. The best approach is to link to the main page of write ups (http://bcb.io) and code (https://github.com/chapmanb/bcbio-nextgen).

lpantano commented 7 years ago

Hi Nick,

I will close this guessing you got all the information you needed. Let us know if you find more issues.

Thanks!