Closed riederd closed 8 years ago
Many thanks
Error: need reference sequence file for writing CRAM file
This is not true: when writing CRAM, if the @SQ
headers contain UR
or M5
fields, a reference sequence file is not necessary. Most outside-htslib code should not need to know whether a file is SAM, BAM, or CRAM (and if it does it should use hts_get_format()
, not htsfile->format.format
), so I suggest you just remove the if( htsfile->format.format == cram )
and else croak
.
If it were me, I would add another Bio::DB::HTSfile method that wrapped hts_set_fai_filename()
instead of having an optional header_write
argument, as it can be useful in other circumstances than writing headers as well (e.g., when reading SAM).
Hi @jmarshall, thanks for those comments. I've just got some questions about implementing further updates.
I think we will need some CRAM specific checking to see if the reference is specified somehow. The user may be creating a file from a blank template or if converting from a SAM/BAM the M5/UR headers lines are optional. But I will update to use hts_get_format(). I was thinking this made sense as it would be good to pick up the missing header information at this point. What would be the best way to check if the header already contains the required information?
Where would you expect HTSlib to fail if trying to write out a CRAM file with and without the required information set?
I can also set up a hts_set_fai_filename() wrapper. From the comment in the header file it looks like this is for read scenarios only, but the code looks like it will do an update, but only if there is not an existing value in the field. I'm guessing that it is better to not give the user the ability to override the existing UR value - would this be the case?
Hi Rishi,
I followed the suggestions on my last pull request (#34). Shortly:
I hope this of help to others too and is now better suited for merging into the main branch.
Dietmar