ga4gh / TASC

TASC aids the harmonisation of aspects of GA4GH's various products that would otherwise prevent different products from being used together conveniently.
https://www.ga4gh.org
8 stars 7 forks source link

Moving Crypt4GH into its own GitHub repo #40

Open AlexanderSenf opened 2 years ago

AlexanderSenf commented 2 years ago

The Crypt4GH encrypted file format standard is currently part of the hts-specs repo, where it lives next to the SAM, BAM, CRAM, VCF, BED file format standards: https://github.com/samtools/hts-specs. The reason it was placed in there was originally that it is essentially another file format standard (not an API, etc). And samtools was actually the first tool that natively supported it.

However, there have been calls to move Crypt4GH into its own GitHub repo instead; which is reasonable. We have two questions:

1 - Is this a good idea? Does it fit with the larger view on standards in the GA4GH?

2 - What would be the best name/location for this new repo? (Is there a naming scheme that differentiates between standards, implementations, APIs, etc. in https://github.com/ga4gh/?)

jmarshall commented 2 years ago

The Crypt4GH specification is written in LaTeX and published as a PDF document.

The recently added BED specification is similarly LaTeX/PDF, and its authors considered whether they would be best served by joining LSG's other LaTeX/PDF specifications in the hts-specs repository or by creating a new individual repository under the github.com/ga4gh organisation.

In the end, the BED authors found the following arguments convincing:

Like the SAM, CRAM, and VCF specifications, this BED specification is written in LaTeX and published as a PDF. The hts-specs repository has built up a lot of infrastructure for such documents: Makefile rules to generate the final PDF, to embed version information on the title page, and to generate diff PDFs (showing a formatted PDF rendition of the differences between commits); conventions around workflows; a robot that provides PDF previews on pull requests; and automatically publishing the updated PDFs on a web site.

By maintaining the BED document in hts-specs, you would take advantage of all this infrastructure and improvements to it. Or if you do use a separate repository, you should consider copying some or all of this infrastructure and setting it up in that new repository.

The other consideration is that there is a useful synergy that comes from sharing a workspace with the other documents. IMHO all of SAM/BAM, CRAM, and VCF/BCF have benefitted from the greater number of people working in the hts-specs repository: SAM people are interested in and comment on VCF changes and propose improvements and corrections to the VCF documents, and vice versa. I expect that the BED document would similarly benefit from the greater exposure that comes from the busier shared repository.

The same arguments apply for Crypt4GH IMHO, though I notice that the document has not been updated since it was added in 2019 and there is only one crypt4gh-labelled issue (which has garnered no comments). (So to date Crypt4GH has not gained much from that synergy, and in fact has barely benefitted from being in a GitHub repository at all!)

Is there an itemisation of the advantages to Crypt4GH of moving to its own GitHub repo?


And samtools was actually the first tool that natively supported it.

As described at samtools.github.io, the GitHub samtools organisation is shared between what is now GA4GH LSG (samtools/hts-specs repo), Broad-related Java developers (samtools/htsjdk and samtools/htsjdk-next-beta repos), and Sanger-related C developers (samtools/htslib, samtools/samtools, samtools/bcftools et al). This is partly for historical reasons — e.g. the hts-specs repository predates GA4GH's formation.

So that's the reason for the organisational location; being supported first in the samtools tool is coincidental.