RabadanLab / arcasHLA

Fast and accurate in silico inference of HLA genotypes from RNA-seq
GNU General Public License v3.0
121 stars 50 forks source link

IMGT/HLA version 3.56.0 and onwards provides large files as zip files #133

Open Carovanandel opened 7 months ago

Carovanandel commented 7 months ago

Hi,

The IMGT/HLA reference from version 3.56.0 onwards provides large files as zip files, as can be read on the IMGT/HLA github page: As of Release 3.56.0, due April 2024, all large files (>100MB) will be provided as compressed files rather than utilise Git LFS, which was previously required. This includes the hla.dat, xml/hla.xml and xml/hla_ambigs.xml in the next release. This has been done to simplify the cloning process and also due to escalating and unpredictable costs in providing the files using Git LFS from a public repository. All compressed files will use the [ZIP format](https://en.wikipedia.org/wiki/ZIP_(file_format)). This formatting change will be applied to all branches.

This breaks your code, as files like hla.dat cannot be found as they are zipped. Using IMGT/HLA versions up until 3.55.0 seems to work fine. I have created a pull request to update the reference list in parameters.json to include the IMGT/HLA versions 3.47.0-3.56.0, as they were missing, so the arcasHLA reference --version command works with these versions. However, from 3.56.0 onwards, it does not work anymore. Could you update your code to work with the zipped files?

Thanks in advance!

kalanir commented 4 months ago

I am also currently running into this issue! Keep running into the error: FileNotFoundError: [Errno 2] No such file or directory: '/home/arcas-hla-0.5.0-1/scripts/../dat/IMGTHLA/hla.dat' when in fact hla.dat.zip exists

tyxdavid commented 3 weeks ago

Also encounter the same issue. A workaround is downloading the IMGTHLA database as usual, and manually unzip the zipped files under dat/IMGTHLA/ . Replace the reference.py in scripts/ with the attached script. Use command 'arcasHLA reference --update_static' to write neccessary files for the analysis. reference.zip