The-Sequence-Ontology / SO-Ontologies

Collect of SO Ontologies
Creative Commons Attribution 4.0 International
96 stars 37 forks source link

SO sponsorship for submission of bioinformatics file formats for IANA Media Types ("MIME") registry? #646

Open photocyte opened 8 months ago

photocyte commented 8 months ago

Continuation of this email thread with keilbeck&genetics.utah.edu & evan.christensen&utah.edu, from tfallon&ucsd.edu. That thread resolved to give a shot at SO sponsored submission of bioinformatics file formats to the IANA Media Types ("MIME") registry.

This thread can be used for public comment on SO sponsored submission of bioinformatics file formats. Private comment can be sent to the above email addresses.

This issue comment is currently a stub for the targeted file formats for submission. I will keep editing the comment expanding the table & ping folks once it is done.

file format wikipedia link IANA registry status owner contact email
gff3 https://en.wikipedia.org/wiki/General_feature_format accepted to Standards Tree (https://www.iana.org/assignments/media-types/text/gff3) SO song-devel&lists.utah.edu
fasta https://en.wikipedia.org/wiki/FASTA_format TBD ? ?
fastq https://en.wikipedia.org/wiki/FASTQ_format TBD ? ?
vcf https://en.wikipedia.org/wiki/Variant_Call_Format TBD Global Alliance for Genomics and Health (GA4GH) ?
bcf N/A TBD ? ?
sam https://en.wikipedia.org/wiki/SAM_(file_format) TBD ? ?
bam https://en.wikipedia.org/wiki/Binary_Alignment_Map TBD ? ?
cram https://en.wikipedia.org/wiki/CRAM_(file_format) TBD ? ?
genbank N/A TBD NCBI ?
gfa N/A TBD ? ?
bed https://en.wikipedia.org/wiki/BED_(file_format) TBD ? ?

To my understanding, SO can speak on behalf i.e. submit non-SO "owned" file formats to IANA Media Type registry. I took a look through the IANA procedures, most recently spelled out in RFC6838: https://www.rfc-editor.org/rfc/rfc6838.html

Per that RFC, in order for a Media Type (the current name for the MIME types), to be accepted in the IANA registry in the “Standards Tree”, and thus without an additional prefix to the media type name (vnd. for the “Vendor tree”, prs. for the “Personal tree”), it has to be submitted by a “Standards Organization”.

Seeing as gff3 was previously submitted by SO and accepted, I assume that means SO is an accepted standards organization in IANA’s eyes. Since gff3 is the only bioinformatics file format that was submitted to IANA, currently SO is the only vetted bioinformatics-related standards organization for this process. It’s preferred in the RFC that the submitting standards organization “owns" the file format, but is not required. There are procedures in the RFC to resolve ownership in case of a dispute.

egchristensen commented 5 months ago

GA4GH may be a better forum for this discussion and organizing sponsorship. They're currently managing BED, CRAM, VCF, and SAM/BAM. While the GFF3 specification has been published on SO's github for a while now, it would make sense for it to be formalized and published on GA4GH's site along with the other bioinformatics standards. GA4GH has the resources to facilitate discussion and provide long-term stewardship of file format specifications.

I represent SO in the Sequence Annotation study group (part of the GA4GH Genomic Knowledge Work Stream). If you like, I can ping someone at GA4GH secretariat to see if this is something that would fall within GA4GH's scope. If GA4GH isn't vetted as a standards organization at IANA yet we may want to take care of that first.

photocyte commented 5 months ago

Thanks @egchristensen ! It’d be great if you might ping the GA4GH secretariat. I have some IANA templates filled out & not submitted if that would help. I don’t have a strong want to be the submitter but just was surprised it hadn’t been done yet.

edit: thanks Eric for sending that email. I'm making a note here to link out to the email thread (it keeps it confidential, it's just a way for me to easily access it in my own email client): https://hookmark.net/hm/hook/email/BYAPR11MB2854070C78C7440AA0C73ED5FEFA2%40BYAPR11MB2854.namprd11.prod.outlook.com