Bioconductor / Rhtslib

HTSlib high-throughput sequencing library as an R package
https://bioconductor.org/packages/Rhtslib
11 stars 12 forks source link

Please do not expose HTSlib's internal CRAM functions #30

Closed jmarshall closed 2 years ago

jmarshall commented 2 years ago

I notice that the recent 76c5c2362a272e9d46638638851cb78aebdc8768 makes a number of HTSlib's private internal header files available to Rhtslib users: the internal cram/*.h headers and the internal header.h; meanwhile the internal _htsinternal.h, and _textutilsinternal.h have already been made available since Rhtslib 1.14.0.

Only the API functions in HTSlib's *public headers in _htslib-x.y/htslib/.h_** are supported stable interfaces and not subject to change.

As @jkbonfield can confirm, HTSlib's public CRAM-specific interfaces are those defined in htslib/cram.h. You do not need to install any of the internal headers in cram/*.h and should not, as they are likely to change incompatibly without notice. If any dependent R libraries do need functionality from these internal headers, you should raise an issue on HTSlib to have the desired functionality or an equivalent added to the public htslib/cram.h header.

Also header.h is an internal HTSlib header that is likely to change incompatibly without notice. The supported public interfaces to this header functionality are in htslib/sam.h, so there is no need to install the internal header.h header file.

Similarly there is no supported public functionality in _htsinternal.h or _textutilsinternal.h — the clue's in the name! — so there should be no need to make these available to Rhtslib users either. I see these have been being installed since Rhtslib 1.14.0 — what was the rationale for this? Have they ever been used by an R library?

hpages commented 2 years ago

Thanks for the feedback. I think we can get rid of the cram/*.h files. AFAIK no Bioconductor package is using them yet. Then we should also be able to get rid of header.h, which was only installed because cram/cram.h needs it.

I'll also look into getting rid of textutils_internal.h and hts_internal.h. They were added a long time ago but maybe some Bioconductor packages are still using them, in which case those packages will first need to be modified.

@mtmorgan It looks like in the cram branch of Rsamtools you're including cram/cram.h: https://github.com/Bioconductor/Rsamtools/blob/6bab10b94573cc6cdc5f9aa0585f9246dae9b7ca/src/hts_utilities.c#L2 This is the only reason I exposed the cram/*.h files in the latest Rhtslib. Do you think you can use htslib/cram.h instead? Thanks.

mtmorgan commented 2 years ago

Yes I'll adjust

hpages commented 2 years ago

Ok so I reverted commit 76c5c2362a272e9d46638638851cb78aebdc8768. See commit 14645306bf45504caca6dec13e07693e23a5b6f3.

Turns out that we still need to install hts_internal.h (and consequently textutils_internal.h, which is used in hts_internal.h) because Rsamtools makes calls to hts_idx_getfn() (see here), a function that is declared in hts_internal.h.

@jmarshall Any chance hts_idx_getfn() can be moved to HTSlib's public API, or should we use something else in Rsamtools?

Thanks, H.

jmarshall commented 2 years ago

There's no particular need to preflight the index filename. I would suggest using bcf_index_load() instead of bcf_index_load2(), and rephrasing the resulting error message as per the previous one:

        bfile->index = bcf_index_load(fn);
        if (bfile->index == NULL) {
            …
            Rf_error("no valid VCF/BCF index found\n  filename: %s", fn);
hpages commented 2 years ago

Ah I see. This code is a relic from the samtools 0.1.19 era when it seems that the only way to load the BCF index was to call bcf_idx_load() on the index filename. When Rsamtools got migrated from samtools 0.1.19 to htslib 1.7 a few years ago, this code didn't get updated in a way that would take advantage of bcf_index_load().

I made the change in Rsamtools 2.13.3. Thanks for the suggestion.

Also modified Rhtslib to no longer install textutils_internal.h and hts_internal.h. This is in Rhtslib 1.99.5 (commit a5335e1fea92ea06c33d42c7e290b6d8b4245ad3).

Cheers, H.