bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

OptiType for RNA-Seq data #1576

Closed rdocking closed 8 years ago

rdocking commented 8 years ago

Hi there:

I'm interested in using the OptiType HLA genotyper with RNA-Seq data. I've done this using the stand-alone OptiType tool with good results so far, but I'd like to try scaling it up within the context of bcbio.

From a quick look at the code, the relevant section in optitype.py is:

        cmd = ("OptiTypePipeline.py -v --dna {opts} -o {tx_out_dir} "
                "-i {hla_fq} -c {config_file}")

... would need to be set to --rna rather than --dna to make this work. I could potentially help set up a pull request to include this if there's interest. Thoughts? Thanks for all your work on bcbio!

chapmanb commented 8 years ago

Thanks for the interest in adding OptiType to RNA-seq. Unfortunately this is going to be a bit bigger project to integrate within bcbio. We'd need to do two things:

We're happy to help as you implement this. Apologies it'll be a bigger project that initially envisioned but hope this helps for getting started.

roryk commented 8 years ago

How are you extracting the HLA reads for variant calling?

chapmanb commented 8 years ago

For variant calling, we use hg38 HLA alternative contigs and bwa extracts only the HLA aligning reads into separate fastq files that we can feed into OptiType. We'd need some equivalent upfront process to avoid needing to pre-process the full set of reads to extract only HLA aligners for RNA-seq, but not sure what the best approach is.

rdocking commented 8 years ago

Thanks for your response. It does indeed sound like this would be a bigger project than I was envisioning.

When I've run this in the past, I haven't done any HLA read extraction prior to running OptiType. This means that all the reads end up being re-aligned (and loaded into memory) as part of the OptiType run - I can see that this may cause run-time / memory issues if this approach was taken within bcbio.

I haven't tested using bwa on hg38 to extract HLA-aligning reads from RNA-Seq data - I'm not sure whether it would work or not (or whether there's some other pre-processing approach that might work for bcbio).

Perhaps best to close this issue for now then. If I end up needing to scale this up dramatically, I can investigate read pre-processing steps that would allow for simpler integration with bcbio.

chapmanb commented 8 years ago

Thanks for following up on this. We'd definitely be happy to work with you if you want to re-explore this, just check back in and we'll reopen and tackle it. Thanks again.