Ensembl / Bio-DB-HTS

Git repo for Bio::DB::HTS module on CPAN, providing Perl links into HTSlib
Apache License 2.0
24 stars 16 forks source link

Impelement write capability #35

Closed riederd closed 8 years ago

riederd commented 8 years ago

Hi Rishi,

I followed the suggestions on my last pull request (#34). Shortly:

I hope this of help to others too and is now better suited for merging into the main branch.

Dietmar

rishidev commented 8 years ago

Many thanks

jmarshall commented 8 years ago

Error: need reference sequence file for writing CRAM file

This is not true: when writing CRAM, if the @SQ headers contain UR or M5 fields, a reference sequence file is not necessary. Most outside-htslib code should not need to know whether a file is SAM, BAM, or CRAM (and if it does it should use hts_get_format(), not htsfile->format.format), so I suggest you just remove the if( htsfile->format.format == cram ) and else croak.

If it were me, I would add another Bio::DB::HTSfile method that wrapped hts_set_fai_filename() instead of having an optional header_write argument, as it can be useful in other circumstances than writing headers as well (e.g., when reading SAM).

rishidev commented 8 years ago

Hi @jmarshall, thanks for those comments. I've just got some questions about implementing further updates.

I think we will need some CRAM specific checking to see if the reference is specified somehow. The user may be creating a file from a blank template or if converting from a SAM/BAM the M5/UR headers lines are optional. But I will update to use hts_get_format(). I was thinking this made sense as it would be good to pick up the missing header information at this point. What would be the best way to check if the header already contains the required information?

Where would you expect HTSlib to fail if trying to write out a CRAM file with and without the required information set?

I can also set up a hts_set_fai_filename() wrapper. From the comment in the header file it looks like this is for read scenarios only, but the code looks like it will do an update, but only if there is not an existing value in the field. I'm guessing that it is better to not give the user the ability to override the existing UR value - would this be the case?