RabadanLab / arcasHLA

Fast and accurate in silico inference of HLA genotypes from RNA-seq
GNU General Public License v3.0
114 stars 49 forks source link

How to get the latest reference? #61

Closed slowkow closed 2 years ago

slowkow commented 3 years ago

Here's what I tried:

git clone git@github.com:RabadanLab/arcasHLA.git
cd arcasHLA

./arcasHLA reference --update

But I got an error:

Traceback (most recent call last):
  File "/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/reference.py", line 536, in <module>
    build_convert(False)
  File "/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/reference.py", line 464, in build_convert
    p_group = process_hla_nom(hla_nom_p)
  File "/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/reference.py", line 234, in process_hla_nom
    for line in open(hla_nom, 'r', encoding='UTF-8'):
FileNotFoundError: [Errno 2] No such file or directory: '/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/../dat/IMGTHLA/wmda/hla_nom_p.txt'

Next, I tried this:

./arcasHLA reference --version 3.36.0

But I got an error:

Traceback (most recent call last):
  File "/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/reference.py", line 548, in <module>
    build_fasta()
  File "/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/reference.py", line 416, in build_fasta
    utrs, exons, final_exon_length) = process_hla_dat()
  File "/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/reference.py", line 134, in process_hla_dat
    with open(hla_dat, 'r', encoding='UTF-8') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/../dat/IMGTHLA/hla.dat'
slowkow commented 3 years ago

I tried the command in the README file, but it also fails with the same error:

./arcasHLA reference --version 3.24.0
Traceback (most recent call last):
  File "/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/reference.py", line 548, in <module>
    build_fasta()
  File "/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/reference.py", line 416, in build_fasta
    utrs, exons, final_exon_length) = process_hla_dat()
  File "/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/reference.py", line 134, in process_hla_dat
    with open(hla_dat, 'r', encoding='UTF-8') as file:
FileNotFoundError: [Errno 2] No such file or directory:
 '/home/ks38/work/github.com/RabadanLab/arcasHLA/scripts/../dat/IMGTHLA/hla.dat'
slowkow commented 3 years ago

I think the issue was that I did not have git-lfs installed.

So, here's how I installed git-lfs:

wget https://github.com/git-lfs/git-lfs/releases/download/v2.13.2/git-lfs-linux-amd64-v2.13.2.tar.gz
tar xf git-lfs-linux-amd64-v2.13.2.tar.gz
cd git-lfs
PREFIX=$HOME/.local ./install.sh

Next, I ran this command again:

./arcasHLA reference --update

And I think it seems to work! This time, I can see the file dat/IMGTHLA/hla.dat has been created.

It might be helpful for newcomers to add a check for git-lfs somewhere in the arcasHLA scripts.

tpereachamblee commented 3 years ago

Hello! Thanks for your interest in our tool. Do you mind verifying which version of the tool you are using? Although you have included a command that looks like it clones the current master branch of this repository, the issue that you have reported, the errors that you are receiving, and the suggested fix are more consistent with the latest tagged release and have been discussed and addressed in issue #32.

That said, the behavior you should expect if you clone the master branch is included below. I have updated the Dockerfile in the Docker folder with commit 0beeb83 so that it does not install the now depreciated git lfs (which will no longer a dependency of arcasHLA). You should be able to build and run the container to reproduce the following (You'll notice that running the reference command with --update does presently fail, as does running it with any version above 3.34.0 - this behavior is discussed in issue #59 and will be corrected with an upcoming release).

tpereachamblee commented 2 years ago

Hello! Thanks for your interest in our tool. An updated release with up to date IMGT/HLA versions and commit hashes has just been released (v0.3.0). Moreover, between major releases users can use the --commit flag to manually pass hashes to more recent releases like so:

arcasHLA reference --commit <IMGT/HLA_commit_hash>

Additionally, it is worth noting that the --update flag always pulls the latest version.