ANHIG / IMGTHLA

Github for files currently published in the IPD-IMGT/HLA FTP Directory hosted at the European Bioinformatics Institute
http://www.ebi.ac.uk/ipd/imgt/hla/
Other
199 stars 60 forks source link

Release version as JSON and/or XML #356

Closed zabeen closed 7 months ago

zabeen commented 7 months ago

Hi

I just spotted the release_version.txt file, which is really useful, many thanks for adding that.

Kindly requesting a second file containing the same data, but in a fixed, defined schema optimised for app consumption.

e.g., release_notes.json

{
  "version": "3540",
  "date": "2023-10-12"
}
<?xml version="1.0" encoding="UTF-8"?>
<release>
  <version>3540</version>
  <date>2023-10-12</date>
</release>

My personal preference is Json, but would greatly appreciate either!

TIA

dominicbarkerAN commented 7 months ago

Hi Zabeen, thank you for your query. As you have noted the information you are looking for is available in the release_version.txt file which is a format consistent with the headers of other .txt files in this repository. Perhaps if you could provide more information on your use case we can look for a suitable solution, however we are relucant to bui;d new file formats containing information which is available in the existing files.

zabeen commented 7 months ago

Hi,

Thanks for your reply.

The Anthony Nolan search algorithm, Atlas, consumes various files from IMGT/HLA repo. It needs to do this every time a new IMGT/HLA version is detected. It currently checks the current version number a different way as the application was developed several years ago well before the release_version.txt file was added.

Text headers are essential free text which is optimal for human "consumption", but they are liable to changes in format or wording which would break prod code.

Data represented by a schema, like Json or XSD, can be easily modelled within a consumer app, and also version controlled, which make them resistant to breaking changes.

BW

Zabeen

dominicbarkerAN commented 7 months ago

Hi Zabeen,

The release_version.txt file, along with the headers of other txt files, is standardised in this repository and is not liable to change, the format has been consistent since it was introduced in 2020 at the request of users to serve this function.

Based on your requirements I believe the release_version.txt is likely to be the best way to access the information you need, however the current solution in Atlas will continue to work as the Allele_history.txt file will continue to be supported.

The release version could also be obtained by extracting from the release tag of the new hla.xml format, which is named hla_new.xml in 3.55.0, but will replace the hla.xml from 3.56.0 onwards, or could be obtained in JSON format from our Allele Query API which does provide responses in JSON format. Documentation on our Allele API can be found here:

https://www.ebi.ac.uk/ipd/imgt/hla/about/help/api/

However here is an example query that will always include the latest release version:

https://www.ebi.ac.uk/cgi-bin/ipd/api/allele?query=eq(accession,HLA00001)&fields=release_version,release_date

zabeen commented 7 months ago

Thanks for the info. I'll look into it.