Ensembl / Bio-DB-HTS

Git repo for Bio::DB::HTS module on CPAN, providing Perl links into HTSlib
Apache License 2.0
24 stars 16 forks source link

* Implemented initial BAM writing capabilities #34

Closed riederd closed 8 years ago

riederd commented 8 years ago

Hi Rishi, as you suggested via e-mail I created a pull request with adds BAM writing and sorting capabilites. I did some testing and it seems to work. I hope my additions find their way in to the main branch.

Dietmar

rishidev commented 8 years ago

Hi Dietmar

Thanks for the first draft of write. There's a couple of high-level points to resolve before getting into details, which are to fit in with the philosophy this library has to support.

Firstly the sort/merge functionality needs to be removed. The motivation for SAMtools splitting into HTSlib and Samtools was that HTSlib provides the low-level file read/write access and Samtools performs any further analysis required, including the sort functions. Similarly Bio::DB::HTS is meant to provide an interface into HTSlib, so the sort is beyond the intended scope of this library.

Secondly - the library needs to have support for CRAM as well as BAM, so read and header write functions need to support the ability to write a header and a read to a CRAM file. This is what makes this task need some thoughtful investigation. In particular, the BAM and CRAM headers are very different beasts, and may need implementing through different parts of the HTSlib API.

Thirdly we need some test cases. These are added to the t directory. It'll probably be best to add a 07bamwrite.t. It should contain a test to write out reads to a BAM file, close the file handle, read them back in and compare the write result to the read in result. It should also contain some sort of location based retrieval. As one of the aims of the library is to be BAM/CRAM agnostic, we should do the same for a CRAM file.

Thanks for having a go - let me know if this is something you feel you can proceed with,

Rishi