Bioconductor / Rsamtools

Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import
https://bioconductor.org/packages/Rsamtools
Other
27 stars 27 forks source link

CRAM support - why not? #56

Open pd3 opened 1 year ago

pd3 commented 1 year ago

It appears CRAM is not supported by Rsamtools. I wonder why is that? Samtools support CRAM natively, so I would naively expect this should be a trivial task, no?

This issue came up from other programs that rely on Rsamtools, please see here https://github.com/vplagnol/ExomeDepth/issues/17

vjcitn commented 1 year ago

Thanks for your note. There hasn't been strong demand for CRAM functionality. Rhtslib is up to htslib 1.15, and the latter has moved to 1.18. Resources in Bioconductor core are very tight and it has not been clear whether Rsamtools should be upgraded to handle CRAM, or another htslib-dependent package should be introduced. If you see a way to modifying Rsamtools to deal with CRAM in a way that is useful to you we would surely work with a PR.

hpages commented 1 year ago

FWIW some work was done by @mtmorgan last year to support CRAM, see the cram branch, but I don't know how far Martin went or how much work would still be needed before this branch can be merged into devel.

hpages commented 9 months ago

UPDATE: The latest Rhtslib (2.99.2) now includes htslib 1.18 (was 1.15). See https://github.com/Bioconductor/Rhtslib/blob/devel/DESCRIPTION.

I don't know how this update will impact CRAM support in Rsamtools though.

vjcitn commented 9 months ago

@pd3 please have a look at the cram branch noted in previous comments supply information on the use case and errors encountered thanks

vjcitn commented 9 months ago

As noted some time ago, our core has not encountered significant demand for CRAM support. It would be logical to add such to Rsamtools, but efforts to do this have not come to closure. I followed instructions at http://www.htslib.org/workflow/cram.html to make a fresh example, but ran into errors of this nature:

[E::easy_errno] Libcurl reported error 77 (Problem with the SSL CA cert (path? access rights?))
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/6681ac2f62509cfc220d78751b8dc524": Input/output error

That crops up while exploring a role for pysam in providing CRAM support. I'd really like to see some community input on use cases and good example files to make a case for a solution, which I suspect could involve reticulate/basilisk with pysam.